Quantitative Finance

Course 3: Quantitative Finance & Portfolio Theory

Welcome to the Quantitative Finance course!

In this course, you'll master the mathematical foundations that power institutional trading systems, from statistical analysis to portfolio optimization and risk management.


Attribute Value
Modules 18
Duration ~45 hours
Exercises 108
Prerequisites Course 0 (Python for Finance)

What You'll Build

By the end of this course, you'll have built:

  1. Statistical Analysis Tools - Analyze financial returns and fit distributions
  2. Portfolio Optimizer - Construct efficient portfolios using mean-variance optimization
  3. Risk Management System - Calculate VaR, CVaR, and stress test portfolios
  4. Monte Carlo Simulator - Simulate thousands of portfolio paths
  5. Performance Dashboard - Interactive analytics and attribution
  6. Production System - Deployed, monitored quantitative infrastructure

The Capstone Project integrates all these into a complete portfolio management system.

Course Structure

Part 1: Statistical Foundations (Modules 1-3)
    └── Statistics, Returns, Time Series
              │
              ▼
Part 2: Portfolio Theory (Modules 4-6)
    └── Basics, Optimization, Advanced Techniques
              │
              ▼
Part 3: Risk Modeling (Modules 7-9)
    └── VaR, Beyond VaR, Factor Models
              │
              ▼
Part 4: Simulation & Analytics (Modules 10-12)
    └── Monte Carlo, Attribution, Dashboards
              │
              ▼
Part 5: Production & Infrastructure (Modules 13-18)
    └── Reporting, Execution, Microstructure, HFT, Cloud, Operations
              │
              ▼
        CAPSTONE PROJECT

Module Overview

Part 1: Statistical Foundations

Module Title Key Concepts
1 Statistics for Finance Descriptive stats, probability distributions, hypothesis testing, correlation
2 Return Analysis Simple vs log returns, annualization, Sharpe ratio, drawdowns
3 Time Series Analysis Stationarity, autocorrelation, ARIMA, volatility clustering

Part 2: Portfolio Theory

Module Title Key Concepts
4 Portfolio Basics Risk/return tradeoff, diversification, efficient frontier
5 Portfolio Optimization Mean-variance, Maximum Sharpe, Minimum Variance, constraints
6 Advanced Techniques Risk parity, Black-Litterman, robust optimization

Part 3: Risk Modeling

Module Title Key Concepts
7 Value at Risk Historical VaR, parametric VaR, Monte Carlo VaR
8 Beyond VaR CVaR/Expected Shortfall, drawdown analysis, stress testing
9 Factor Models CAPM, Fama-French, PCA-based factors, factor attribution

Part 4: Simulation & Analytics

Module Title Key Concepts
10 Monte Carlo Simulation GBM, correlated assets, option pricing, portfolio simulation
11 Performance Attribution Brinson attribution, factor attribution, contribution analysis
12 Building Dashboards Plotly, real-time metrics, interactive visualizations

Part 5: Production & Infrastructure

Module Title Key Concepts
13 Professional Reporting Automated reports, PDF generation, scheduling
14 Rebalancing & Execution Calendar/threshold rebalancing, transaction costs, tax-loss harvesting
15 Market Microstructure Order books, bid-ask spread, price impact, optimal execution
16 High-Frequency Concepts Latency, co-location, HFT strategies, regulations
17 Cloud Deployment AWS/GCP, Docker, serverless, CI/CD
18 24/7 Operation Monitoring, alerting, incident response, backup/recovery

Prerequisites Check

Before starting, ensure you can run the following code without errors:

# Prerequisites Check
import sys
print(f"Python version: {sys.version}")

# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Finance libraries
import yfinance as yf
from scipy import stats, optimize

print("\nAll prerequisites installed!")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
# Quick test: Download sample data
print("Testing data download...")
data = yf.download('SPY', period='1mo', progress=False)
print(f"Downloaded {len(data)} days of SPY data")
print("\nYou're ready to start Course 3!")

How to Use This Course

Learning Approach

  1. Read the concepts - Understand the theory before coding
  2. Run the examples - Execute all code cells to see results
  3. Do the exercises - Practice with guided and open-ended problems
  4. Check solutions - Compare your approach after attempting
  5. Build the project - Apply everything in the module project

Exercise Format

Each module has 6 exercises: - 3 Guided exercises - Fill in the blanks with hints provided - 3 Open-ended exercises - Build complete solutions from scratch

Solutions are provided in collapsible sections - try first before peeking!

Time Commitment

  • Each module: ~2.5 hours
  • Recommended pace: 1-2 modules per session
  • Total course: ~45 hours

Let's Begin!

Start with Module 1: Statistics for Finance — good luck on your quantitative finance journey!

Module 1: Statistics for Finance

Course 3: Quantitative Finance & Portfolio Theory
Part 1: Statistical Foundations


Learning Objectives

By the end of this module, you will be able to:

  1. Calculate and interpret descriptive statistics for financial returns
  2. Apply probability distributions to model asset prices and returns
  3. Conduct hypothesis tests to validate trading strategies
  4. Compute and analyze correlation and covariance matrices
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Course 0 (Python for Finance)

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Financial Data

# Download stock data
tickers = ['SPY', 'AAPL', 'MSFT', 'GLD', 'TLT']
start_date = '2019-01-01'
end_date = '2024-01-01'

print(f'Downloading data for {tickers}...')
data = yf.download(tickers, start=start_date, end=end_date, progress=False)

# Handle different yfinance column structures
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

returns = prices.pct_change().dropna()

print(f'Data range: {prices.index.min().date()} to {prices.index.max().date()}')
print(f'Trading days: {len(prices)}')
prices.tail()

Section 1.1: Descriptive Statistics

Before diving into complex models, we need to understand our data. Descriptive statistics give us a snapshot of what we're working with.

In this section, you will learn: - How to measure the "center" of returns (mean, median) - How to measure the "spread" of returns (variance, std deviation) - How to measure the "shape" of returns (skewness, kurtosis)

1.1.1 Central Tendency: Mean vs Median

The mean is the average - simple to calculate but sensitive to outliers.

The median is the middle value - robust to outliers.

Why does this matter in finance? - A few extreme days (like March 2020) can significantly skew the mean - The median tells you what a "typical" day looks like

# Let's look at SPY returns
spy_returns = returns['SPY']

# Calculate mean and median
mean_ret = spy_returns.mean()
median_ret = spy_returns.median()

print('=== SPY Daily Returns ===')
print(f'Mean:   {mean_ret:.6f} (annualized: {mean_ret*252:.2%})')
print(f'Median: {median_ret:.6f}')
print(f'\nDifference: {(mean_ret - median_ret):.6f}')

1.1.2 Dispersion: Variance & Volatility

Returns fluctuate. We measure this with:

  • Variance (σ²): Average squared deviation from mean
  • Standard Deviation (σ): Square root of variance
  • Volatility: Annualized std = σ × √252

The key insight: Volatility is often used as a proxy for risk.

# Calculate volatility for all assets
print('=== Annualized Volatility ===')
print('(Higher = more risky)\n')

for ticker in tickers:
    daily_std = returns[ticker].std()
    annual_vol = daily_std * np.sqrt(252)
    print(f'{ticker}: {annual_vol:.1%}')

1.1.3 Shape: Skewness & Kurtosis

Beyond center and spread, the shape of returns matters enormously.

Skewness measures asymmetry: - Negative skew = More extreme losses than gains (bad for investors!) - Positive skew = More extreme gains than losses - Zero skew = Symmetric distribution

Kurtosis measures "tail thickness": - High kurtosis (>0) = Fat tails = More extreme events than expected - Zero kurtosis = Normal distribution tails - Negative kurtosis = Thin tails

# Calculate skewness and kurtosis
print('=== Return Distribution Shape ===')
print(f'{"Asset":<6} {"Skewness":>10} {"Kurtosis":>10}')
print('-' * 28)

for ticker in tickers:
    skew = returns[ticker].skew()
    kurt = returns[ticker].kurtosis()
    print(f'{ticker:<6} {skew:>10.2f} {kurt:>10.2f}')

Visualizing Return Distributions

# Visualize SPY return distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram with normal overlay
ax1 = axes[0]
ax1.hist(spy_returns, bins=50, density=True, alpha=0.7, 
         color='steelblue', edgecolor='white', label='Actual SPY')
x = np.linspace(spy_returns.min(), spy_returns.max(), 100)
ax1.plot(x, stats.norm.pdf(x, mean_ret, spy_returns.std()), 
         'r-', lw=2, label='Normal Distribution')
ax1.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax1.set_xlabel('Daily Return')
ax1.set_ylabel('Density')
ax1.set_title('SPY Returns vs Normal Distribution')
ax1.legend()

# QQ-plot
ax2 = axes[1]
stats.probplot(spy_returns, dist='norm', plot=ax2)
ax2.set_title('Q-Q Plot: Are Returns Normal?')

plt.tight_layout()
plt.show()

Exercise 1.1: Calculate Descriptive Statistics (Guided)

Your Task: Complete the function to calculate key statistics for a return series.

Fill in the blanks to calculate mean, volatility, skewness, and kurtosis:

Exercise
Click to reveal solution
def calculate_return_stats(returns_series: pd.Series) -> dict:
    """Calculate descriptive statistics for a return series."""
    mean_daily = returns_series.mean()
    volatility = returns_series.std() * np.sqrt(252)
    skewness = returns_series.skew()
    kurtosis = returns_series.kurtosis()

    return {
        'mean_daily': mean_daily,
        'mean_annual': mean_daily * 252,
        'volatility': volatility,
        'skewness': skewness,
        'kurtosis': kurtosis
    }

# Test
result = calculate_return_stats(returns['SPY'])
print(f"SPY Annual Return: {result['mean_annual']:.2%}")
print(f"SPY Volatility: {result['volatility']:.2%}")

Section 1.2: Probability Distributions

Now that we understand our data's shape, let's formalize it with probability distributions.

In this section, you will learn: - Why the Normal distribution is useful (and where it fails) - How the Student's t-distribution handles fat tails - How to test if your data follows a specific distribution

1.2.1 The Normal Distribution

Despite its limitations, the Normal distribution is foundational:

  • Parameters: mean (μ) and standard deviation (σ)
  • Properties: 68-95-99.7 rule
  • Use cases: Central Limit Theorem, log returns over long periods
# How often do extreme events occur under a Normal distribution?
print('=== Probability of Extreme Events (Normal Distribution) ===')
print()

spy_std = spy_returns.std()
spy_mean = spy_returns.mean()

for sigma in [2, 3, 4, 5]:
    prob = 2 * (1 - stats.norm.cdf(sigma))  # Two-tailed
    expected_days = int(1 / prob) if prob > 0 else float('inf')
    print(f'{sigma}-sigma event: {prob:.2e} probability')
    print(f'   Expected once every {expected_days:,} trading days ({expected_days/252:.0f} years)')
    print()
# Count actual extreme events in SPY
print('=== Actual vs Expected Extreme Events in SPY ===')
print()

for sigma in [2, 3, 4, 5]:
    threshold = sigma * spy_std
    extreme_days = spy_returns[abs(spy_returns - spy_mean) > threshold]
    count = len(extreme_days)
    
    normal_prob = 2 * (1 - stats.norm.cdf(sigma))
    expected_count = normal_prob * len(spy_returns)
    
    print(f'{sigma}-sigma events: Actual={count}, Expected={expected_count:.1f}, Ratio={count/max(expected_count, 0.1):.1f}x')

1.2.2 The Student's t-Distribution

The t-distribution is like the Normal but with fatter tails.

  • Key parameter: degrees of freedom (df or ν)
  • Lower df = fatter tails
  • As df → ∞, t-distribution → Normal distribution
  • Financial returns typically fit with df = 3 to 8
# Compare Normal vs t-distribution
fig, ax = plt.subplots(figsize=(12, 6))

x = np.linspace(-5, 5, 1000)

ax.plot(x, stats.norm.pdf(x), 'b-', lw=2, label='Normal')
ax.plot(x, stats.t.pdf(x, df=3), 'r-', lw=2, label='t (df=3)')
ax.plot(x, stats.t.pdf(x, df=5), 'g-', lw=2, label='t (df=5)')
ax.plot(x, stats.t.pdf(x, df=10), 'orange', lw=2, label='t (df=10)')

ax.set_xlabel('Standard Deviations from Mean')
ax.set_ylabel('Probability Density')
ax.set_title('Normal vs Student\'s t-Distribution')
ax.legend()
ax.set_xlim(-5, 5)

plt.tight_layout()
plt.show()

print('Notice: Lower degrees of freedom = fatter tails = more extreme events')
# Fit t-distribution to SPY returns
df_fit, loc_fit, scale_fit = stats.t.fit(spy_returns)

print('=== Fitted t-Distribution Parameters ===')
print(f'Degrees of freedom: {df_fit:.2f}')
print(f'Location (mean):    {loc_fit:.6f}')
print(f'Scale (std):        {scale_fit:.6f}')
print(f'\nInterpretation: df={df_fit:.1f} confirms fat tails in SPY returns')

1.2.3 Testing for Normality

Common tests: - Jarque-Bera test: Based on skewness and kurtosis - Shapiro-Wilk test: Compares data to Normal quantiles

Interpretation: - p-value < 0.05 → Reject normality (data is NOT normal) - p-value ≥ 0.05 → Cannot reject normality

# Test for normality
print('=== Normality Tests ===')
print()

for ticker in tickers:
    ret = returns[ticker]
    jb_stat, jb_pval = stats.jarque_bera(ret)
    is_normal = 'YES' if jb_pval > 0.05 else 'NO'
    print(f'{ticker}: Jarque-Bera p-value={jb_pval:.2e}, Normal? {is_normal}')

Exercise 1.2: Fit a t-Distribution (Guided)

Your Task: Complete the function to fit a t-distribution and compare it to the Normal.

Fill in the blanks:

Exercise
Click to reveal solution
def fit_and_compare_distributions(returns_series: pd.Series) -> dict:
    """Fit Normal and t-distribution, compare the fits."""
    norm_mean = returns_series.mean()
    norm_std = returns_series.std()
    t_df, t_loc, t_scale = stats.t.fit(returns_series)
    jb_stat, jb_pval = stats.jarque_bera(returns_series)

    return {
        'norm_mean': norm_mean,
        'norm_std': norm_std,
        't_df': t_df,
        't_loc': t_loc,
        't_scale': t_scale,
        'is_normal': jb_pval > 0.05,
        'jb_pval': jb_pval
    }

# Test
for ticker in tickers:
    result = fit_and_compare_distributions(returns[ticker])
    print(f"{ticker}: t-dist df={result['t_df']:.2f}, Normal? {result['is_normal']}")

Section 1.3: Hypothesis Testing

How do we know if a trading strategy actually works, or if we just got lucky?

Hypothesis testing helps us distinguish skill from randomness.

In this section, you will learn: - How to formulate null and alternative hypotheses - How to conduct t-tests on financial returns - How to interpret p-values correctly

1.3.1 The Hypothesis Testing Framework

Null Hypothesis (H₀): The default assumption (usually "no effect") - Example: "My strategy's mean return is zero"

Alternative Hypothesis (H₁): What we're trying to prove - Example: "My strategy's mean return is positive"

The p-value: Probability of seeing our result (or more extreme) if H₀ is true - p < 0.05 → Reject H₀ (result is "statistically significant") - p ≥ 0.05 → Cannot reject H₀

# Test if SPY has statistically significant positive returns
print('=== One-Sample t-Test: Is SPY Mean Return Zero? ===')
print()

t_stat, p_value = stats.ttest_1samp(spy_returns, 0)

print(f'Sample mean:     {spy_returns.mean():.6f}')
print(f'Sample size:     {len(spy_returns)}')
print(f't-statistic:     {t_stat:.4f}')
print(f'p-value:         {p_value:.4f}')
print()

if p_value < 0.05:
    print('Result: REJECT null hypothesis - SPY has significant non-zero returns')
else:
    print('Result: CANNOT reject null hypothesis')

1.3.2 Two-Sample t-Test: Comparing Returns

Often we want to compare two assets or two strategies.

Question: Does AAPL outperform SPY?

# Compare AAPL vs SPY
print('=== Two-Sample t-Test: AAPL vs SPY ===')
print()

aapl_ret = returns['AAPL']
spy_ret = returns['SPY']

t_stat, p_value = stats.ttest_ind(aapl_ret, spy_ret)

print(f'AAPL mean: {aapl_ret.mean()*252:.2%} annualized')
print(f'SPY mean:  {spy_ret.mean()*252:.2%} annualized')
print(f't-statistic: {t_stat:.4f}')
print(f'p-value:     {p_value:.4f}')
print()

if p_value < 0.05:
    print('Result: Significant difference between AAPL and SPY')
else:
    print('Result: No significant difference')

Exercise 1.3: Hypothesis Test Setup (Guided)

Your Task: Complete the function to perform hypothesis tests on return series.

Fill in the blanks:

Exercise
Click to reveal solution
def test_returns(returns1: pd.Series, returns2: pd.Series = None, 
                 test_value: float = 0, alpha: float = 0.05) -> dict:
    """Perform one-sample or two-sample t-test on returns."""

    if returns2 is None:
        t_stat, p_val = stats.ttest_1samp(returns1, test_value)
        test_type = 'one-sample'
    else:
        t_stat, p_val = stats.ttest_ind(returns1, returns2)
        test_type = 'two-sample'

    is_significant = p_val < alpha

    return {
        'test_type': test_type,
        't_statistic': t_stat,
        'p_value': p_val,
        'is_significant': is_significant,
        'alpha': alpha
    }

# Test
result = test_returns(returns['SPY'])
print(f"SPY vs Zero: p={result['p_value']:.4f}, Significant? {result['is_significant']}")

result = test_returns(returns['GLD'], returns['TLT'])
print(f"GLD vs TLT:  p={result['p_value']:.4f}, Significant? {result['is_significant']}")

Exercise 1.4: Complete Statistical Analysis (Open-ended)

Your Task:

Build a function that performs a complete statistical analysis of a return series: - Calculate all descriptive statistics (mean, std, skewness, kurtosis) - Fit a t-distribution and report degrees of freedom - Test if mean return is significantly different from zero - Return all results in a dictionary

Your implementation:

Exercise
Click to reveal solution
def complete_statistical_analysis(returns_series: pd.Series, name: str = 'Asset') -> dict:
    """Perform complete statistical analysis of a return series."""

    # Descriptive statistics
    desc_stats = {
        'mean_daily': returns_series.mean(),
        'mean_annual': returns_series.mean() * 252,
        'std_daily': returns_series.std(),
        'volatility_annual': returns_series.std() * np.sqrt(252),
        'skewness': returns_series.skew(),
        'kurtosis': returns_series.kurtosis()
    }

    # Distribution fit
    t_df, t_loc, t_scale = stats.t.fit(returns_series)
    jb_stat, jb_pval = stats.jarque_bera(returns_series)

    dist_stats = {
        't_degrees_freedom': t_df,
        'is_normal': jb_pval > 0.05,
        'normality_pval': jb_pval
    }

    # Hypothesis test
    t_stat, p_val = stats.ttest_1samp(returns_series, 0)

    hyp_stats = {
        't_statistic': t_stat,
        'p_value': p_val,
        'significant_returns': p_val < 0.05
    }

    return {
        'name': name,
        'n_observations': len(returns_series),
        'descriptive': desc_stats,
        'distribution': dist_stats,
        'hypothesis_test': hyp_stats
    }

# Test
analysis = complete_statistical_analysis(returns['MSFT'], 'MSFT')
print(f"=== {analysis['name']} Analysis ===")
print(f"Annual Return: {analysis['descriptive']['mean_annual']:.2%}")
print(f"Volatility: {analysis['descriptive']['volatility_annual']:.2%}")
print(f"t-dist df: {analysis['distribution']['t_degrees_freedom']:.2f}")
print(f"Significant? {analysis['hypothesis_test']['significant_returns']}")

Section 1.4: Correlation & Covariance

Understanding how assets move together is fundamental to portfolio construction.

In this section, you will learn: - The difference between covariance and correlation - How to compute and interpret correlation matrices - Why correlation matters for diversification

1.4.1 Covariance and Correlation

Covariance measures joint variability but is scale-dependent.

Correlation standardizes to -1 to +1: - ρ = +1: Perfect positive correlation - ρ = 0: No linear correlation
- ρ = -1: Perfect negative correlation

# Calculate correlation matrix
corr_matrix = returns.corr()

print('=== Correlation Matrix ===')
print(corr_matrix.round(3))
# Visualize correlations
fig, ax = plt.subplots(figsize=(10, 8))

im = ax.imshow(corr_matrix, cmap='RdBu_r', vmin=-1, vmax=1)

ax.set_xticks(range(len(tickers)))
ax.set_yticks(range(len(tickers)))
ax.set_xticklabels(tickers)
ax.set_yticklabels(tickers)

for i in range(len(tickers)):
    for j in range(len(tickers)):
        ax.text(j, i, f'{corr_matrix.iloc[i, j]:.2f}',
                ha='center', va='center', color='black', fontsize=12)

ax.set_title('Asset Correlation Matrix', fontsize=14, fontweight='bold')
plt.colorbar(im, label='Correlation')
plt.tight_layout()
plt.show()

1.4.2 Diversification Benefits

When correlation < 1, portfolio risk < weighted average risk.

This is why diversification works!

# Demonstrate diversification
print('=== Diversification Benefit ===')
print()

spy_vol = returns['SPY'].std() * np.sqrt(252)
tlt_vol = returns['TLT'].std() * np.sqrt(252)
correlation = returns['SPY'].corr(returns['TLT'])

print(f'SPY volatility: {spy_vol:.1%}')
print(f'TLT volatility: {tlt_vol:.1%}')
print(f'Correlation:    {correlation:.3f}')
print()

# 50/50 portfolio
weighted_avg_vol = 0.5 * spy_vol + 0.5 * tlt_vol
portfolio_returns = 0.5 * returns['SPY'] + 0.5 * returns['TLT']
actual_portfolio_vol = portfolio_returns.std() * np.sqrt(252)

print(f'50/50 Portfolio:')
print(f'  If correlation=1: {weighted_avg_vol:.1%}')
print(f'  Actual:           {actual_portfolio_vol:.1%}')
print(f'  Benefit:          {weighted_avg_vol - actual_portfolio_vol:.1%} reduction')

1.4.3 Rolling Correlations

Warning: Correlations change over time, especially during market stress.

# Rolling correlations
window = 60

rolling_corr_spy_tlt = returns['SPY'].rolling(window).corr(returns['TLT'])
rolling_corr_spy_gld = returns['SPY'].rolling(window).corr(returns['GLD'])

fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(rolling_corr_spy_tlt.index, rolling_corr_spy_tlt, label='SPY-TLT', linewidth=1.5)
ax.plot(rolling_corr_spy_gld.index, rolling_corr_spy_gld, label='SPY-GLD', linewidth=1.5, alpha=0.8)

ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.set_xlabel('Date')
ax.set_ylabel('60-Day Rolling Correlation')
ax.set_title('Rolling Correlations Over Time')
ax.legend()
ax.set_ylim(-1, 1)

plt.tight_layout()
plt.show()

Exercise 1.5: Build a Correlation Analyzer (Open-ended)

Your Task:

Build a class that analyzes correlations between assets: - Calculate the full correlation matrix - Find the pair with minimum correlation (best for diversification) - Find the pair with maximum correlation (highest risk concentration) - Calculate rolling correlations for a given window

Your implementation:

Exercise
Click to reveal solution
class CorrelationAnalyzer:
    """Analyze correlations between assets."""

    def __init__(self, returns_df: pd.DataFrame):
        self.returns = returns_df
        self.tickers = returns_df.columns.tolist()
        self.corr_matrix = returns_df.corr()

    def get_correlation_matrix(self) -> pd.DataFrame:
        """Return the correlation matrix."""
        return self.corr_matrix

    def find_min_correlation_pair(self) -> tuple:
        """Find the pair with minimum correlation."""
        corr_values = self.corr_matrix.values.copy()
        np.fill_diagonal(corr_values, 1)  # Ignore diagonal

        min_idx = np.unravel_index(np.argmin(corr_values), corr_values.shape)
        asset1 = self.tickers[min_idx[0]]
        asset2 = self.tickers[min_idx[1]]
        min_corr = self.corr_matrix.loc[asset1, asset2]

        return (asset1, asset2, min_corr)

    def find_max_correlation_pair(self) -> tuple:
        """Find the pair with maximum correlation (excluding self)."""
        corr_values = self.corr_matrix.values.copy()
        np.fill_diagonal(corr_values, -1)  # Ignore diagonal

        max_idx = np.unravel_index(np.argmax(corr_values), corr_values.shape)
        asset1 = self.tickers[max_idx[0]]
        asset2 = self.tickers[max_idx[1]]
        max_corr = self.corr_matrix.loc[asset1, asset2]

        return (asset1, asset2, max_corr)

    def rolling_correlation(self, asset1: str, asset2: str, window: int = 60) -> pd.Series:
        """Calculate rolling correlation between two assets."""
        return self.returns[asset1].rolling(window).corr(self.returns[asset2])

# Test
analyzer = CorrelationAnalyzer(returns)
min_pair = analyzer.find_min_correlation_pair()
max_pair = analyzer.find_max_correlation_pair()
print(f'Best diversification: {min_pair[0]}-{min_pair[1]} (corr={min_pair[2]:.3f})')
print(f'Highest correlation:  {max_pair[0]}-{max_pair[1]} (corr={max_pair[2]:.3f})')

Exercise 1.6: Diversification Calculator (Open-ended)

Your Task:

Build a function that calculates the diversification benefit of combining two assets: - Calculate individual asset volatilities - Calculate the 50/50 portfolio volatility - Calculate the "weighted average" volatility (if correlation=1) - Calculate the diversification benefit (reduction in volatility) - Return the percentage improvement

Your implementation:

Exercise
Click to reveal solution
def calculate_diversification_benefit(returns_df: pd.DataFrame, 
                                       asset1: str, 
                                       asset2: str,
                                       weight1: float = 0.5) -> dict:
    """Calculate diversification benefit of combining two assets."""

    weight2 = 1 - weight1

    # Individual volatilities
    vol1 = returns_df[asset1].std() * np.sqrt(252)
    vol2 = returns_df[asset2].std() * np.sqrt(252)

    # Correlation
    correlation = returns_df[asset1].corr(returns_df[asset2])

    # Portfolio volatility
    port_returns = weight1 * returns_df[asset1] + weight2 * returns_df[asset2]
    port_vol = port_returns.std() * np.sqrt(252)

    # Weighted average (if correlation = 1)
    weighted_avg_vol = weight1 * vol1 + weight2 * vol2

    # Diversification benefit
    benefit = weighted_avg_vol - port_vol
    benefit_pct = benefit / weighted_avg_vol

    return {
        'asset1': asset1,
        'asset2': asset2,
        'correlation': correlation,
        'portfolio_vol': port_vol,
        'weighted_avg_vol': weighted_avg_vol,
        'diversification_benefit': benefit,
        'benefit_percentage': benefit_pct
    }

# Test
pairs = [('SPY', 'TLT'), ('SPY', 'GLD'), ('AAPL', 'MSFT')]
for a1, a2 in pairs:
    result = calculate_diversification_benefit(returns, a1, a2)
    print(f'{a1}/{a2}: corr={result["correlation"]:.3f}, benefit={result["benefit_percentage"]:.1%}')

Module Project: Statistical Analysis Report

Put together everything you've learned!

Your Challenge:

Create a comprehensive statistical analysis report for QQQ (Nasdaq 100 ETF):

  1. Descriptive Statistics: Mean, std, skewness, kurtosis
  2. Distribution Fit: Fit a t-distribution and interpret degrees of freedom
  3. Hypothesis Test: Test if QQQ has significantly different returns than SPY
  4. Correlation Analysis: How correlated is QQQ with our other assets?
# Module Project: Your implementation here
Click to reveal solution
# Download QQQ data
print('Downloading QQQ data...')
qqq_data = yf.download('QQQ', start=start_date, end=end_date, progress=False)

if isinstance(qqq_data.columns, pd.MultiIndex):
    qqq_prices = qqq_data['Close']['QQQ'] if 'Close' in qqq_data.columns.get_level_values(0) else qqq_data.iloc[:, 0]
else:
    qqq_prices = qqq_data['Close'] if 'Close' in qqq_data.columns else qqq_data['Adj Close']

qqq_returns = qqq_prices.pct_change().dropna()

print('\n' + '='*60)
print('QQQ STATISTICAL ANALYSIS REPORT')
print('='*60)

# 1. Descriptive Statistics
print('\n--- 1. DESCRIPTIVE STATISTICS ---')
print(f'Annual Return: {qqq_returns.mean()*252:.2%}')
print(f'Volatility:    {qqq_returns.std()*np.sqrt(252):.2%}')
print(f'Skewness:      {qqq_returns.skew():.2f}')
print(f'Kurtosis:      {qqq_returns.kurtosis():.2f}')

# 2. Distribution Fit
print('\n--- 2. DISTRIBUTION FIT ---')
t_df, t_loc, t_scale = stats.t.fit(qqq_returns)
jb_stat, jb_pval = stats.jarque_bera(qqq_returns)
print(f't-distribution df: {t_df:.2f}')
print(f'Normal? {jb_pval > 0.05}')

# 3. Hypothesis Test vs SPY
print('\n--- 3. HYPOTHESIS TEST: QQQ vs SPY ---')
common_idx = qqq_returns.index.intersection(returns['SPY'].index)
t_stat, p_val = stats.ttest_ind(qqq_returns.loc[common_idx], returns['SPY'].loc[common_idx])
print(f'p-value: {p_val:.4f}')
print(f'Significant difference? {p_val < 0.05}')

# 4. Correlations
print('\n--- 4. CORRELATION ANALYSIS ---')
for ticker in tickers:
    corr = qqq_returns.loc[common_idx].corr(returns[ticker].loc[common_idx])
    print(f'QQQ vs {ticker}: {corr:.3f}')

print('\n' + '='*60)
print('END OF REPORT')
print('='*60)

Key Takeaways

What You Learned

1. Descriptive Statistics

  • Mean vs Median: Mean is affected by outliers; median shows "typical" values
  • Volatility: Annualized standard deviation is the standard risk measure
  • Skewness: Negative skew means more extreme losses (common in stocks)
  • Kurtosis: High kurtosis means fat tails and more extreme events

2. Probability Distributions

  • Financial returns are NOT normal: They have fat tails
  • t-distribution: Better fits financial data (3-8 degrees of freedom typical)
  • Risk models assuming normality underestimate extreme events

3. Hypothesis Testing

  • p-value < 0.05: Reject null hypothesis (result is "significant")
  • Financial data is noisy: Hard to prove statistical significance
  • Statistical significance ≠ Practical significance

4. Correlation & Covariance

  • Correlation ranges from -1 to +1: Easier to interpret than covariance
  • Diversification works when correlation < 1
  • Correlations change over time: Especially during crises

Coming Up Next

In Module 2: Return Analysis, we'll dive deeper into: - Simple vs Log returns - Annualization of returns and risk - Risk-adjusted performance metrics (Sharpe, Sortino) - Benchmark comparison and Alpha/Beta


Congratulations on completing Module 1! You now have the statistical foundation for quantitative finance.

Module 2: Return Analysis

Course 3: Quantitative Finance & Portfolio Theory
Part 1: Statistical Foundations


Learning Objectives

By the end of this module, you will be able to:

  1. Calculate and interpret simple vs logarithmic returns
  2. Properly annualize returns and volatility
  3. Compute risk-adjusted performance metrics (Sharpe, Sortino, Calmar)
  4. Compare strategies against benchmarks using Alpha, Beta, and Information Ratio
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 1 (Statistics for Finance)

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Financial Data

# Download stock data
tickers = ['SPY', 'AAPL', 'MSFT', 'GLD', 'TLT']
start_date = '2019-01-01'
end_date = '2024-01-01'

print(f'Downloading data for {tickers}...')
data = yf.download(tickers, start=start_date, end=end_date, progress=False)

# Handle different yfinance column structures
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

# Calculate returns
simple_returns = prices.pct_change().dropna()
log_returns = np.log(prices / prices.shift(1)).dropna()

print(f'Data range: {prices.index.min().date()} to {prices.index.max().date()}')
print(f'Trading days: {len(prices)}')
prices.tail()

Section 2.1: Types of Returns

Not all returns are created equal! The way you calculate returns affects everything downstream.

In this section, you will learn: - The difference between simple and log returns - When to use each type - Why this choice matters for your analysis

2.1.1 Simple Returns (Arithmetic Returns)

Formula: R = (P₁ - P₀) / P₀ = P₁/P₀ - 1

Pros: - Intuitive to understand - Additive across assets (for portfolio returns)

Cons: - Not additive across time - Can't simply sum daily returns to get total return

# Calculate simple returns
print('=== Simple (Arithmetic) Returns ===')
print()
print('First 5 days of SPY simple returns:')
print(simple_returns['SPY'].head())
print(f'\nSum of all simple returns: {simple_returns["SPY"].sum():.2%}')
# Compare sum of returns vs actual total return
spy_prices = prices['SPY']

# Actual total return
actual_total = (spy_prices.iloc[-1] / spy_prices.iloc[0]) - 1

# Sum of simple returns (WRONG approach)
sum_of_returns = simple_returns['SPY'].sum()

# Compounded returns (CORRECT approach)
compounded = (1 + simple_returns['SPY']).prod() - 1

print('=== Total Return Calculation ===')
print()
print(f'Actual total return:     {actual_total:.2%}')
print(f'Sum of simple returns:   {sum_of_returns:.2%}  ← WRONG!')
print(f'Compounded returns:      {compounded:.2%}  ← CORRECT!')
print()
print('Lesson: Never sum simple returns to get total return!')

2.1.2 Log Returns (Continuously Compounded Returns)

Formula: r = ln(P₁/P₀) = ln(P₁) - ln(P₀)

Pros: - Additive across time (sum log returns = total log return) - More symmetric (up 50% and down 50% are equal in magnitude) - Better statistical properties (more normal)

Cons: - Not additive across assets - Less intuitive

# Log returns ARE additive across time!
actual_log_total = np.log(spy_prices.iloc[-1] / spy_prices.iloc[0])
sum_of_log = log_returns['SPY'].sum()

print('=== Log Returns: Time Additivity ===')
print()
print(f'Actual log total return: {actual_log_total:.4f}')
print(f'Sum of log returns:      {sum_of_log:.4f}')
print(f'Difference:              {abs(actual_log_total - sum_of_log):.6f}')
print()
print('They match! Log returns can be summed across time.')

2.1.3 Converting Between Return Types

  • Simple to Log: r = ln(1 + R)
  • Log to Simple: R = e^r - 1
# Convert between return types
print('=== Return Conversion ===')
print()

# Take first return as example
simple_r = simple_returns['SPY'].iloc[0]
log_r = log_returns['SPY'].iloc[0]

print(f'Original simple return:  {simple_r:.6f}')
print(f'Original log return:     {log_r:.6f}')
print()

# Convert simple to log
simple_to_log = np.log(1 + simple_r)
print(f'Simple → Log: ln(1 + {simple_r:.6f}) = {simple_to_log:.6f}')

# Convert log to simple
log_to_simple = np.exp(log_r) - 1
print(f'Log → Simple: e^{log_r:.6f} - 1 = {log_to_simple:.6f}')

2.1.4 When to Use Each Type?

Use Case Return Type Reason
Multi-period analysis Log Additive across time
Portfolio returns Simple Additive across assets
Statistical modeling Log Better distributional properties
Reporting to clients Simple More intuitive
# Visual comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Distribution comparison
ax1 = axes[0]
ax1.hist(simple_returns['SPY'], bins=50, alpha=0.6, label='Simple', density=True)
ax1.hist(log_returns['SPY'], bins=50, alpha=0.6, label='Log', density=True)
ax1.set_xlabel('Return')
ax1.set_ylabel('Density')
ax1.set_title('Simple vs Log Returns Distribution')
ax1.legend()

# Scatter plot showing relationship
ax2 = axes[1]
ax2.scatter(simple_returns['SPY'], log_returns['SPY'], alpha=0.3, s=10)
ax2.plot([-0.15, 0.15], [-0.15, 0.15], 'r--', label='y = x')
ax2.set_xlabel('Simple Return')
ax2.set_ylabel('Log Return')
ax2.set_title('Simple vs Log Returns (nearly identical for small values)')
ax2.legend()

plt.tight_layout()
plt.show()

print('For small returns, simple ≈ log. For large returns, they diverge.')

Exercise 2.1: Calculate Total Returns (Guided)

Your Task: Calculate AAPL's total return using both compounded simple returns and summed log returns.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def calculate_total_return(simple_rets: pd.Series, log_rets: pd.Series) -> dict:
    """
    Calculate total return using both methods.
    """
    # Compound simple returns: (1 + r1) * (1 + r2) * ... - 1
    compounded = (1 + simple_rets).prod() - 1

    # Sum log returns and convert to simple
    log_sum = log_rets.sum()
    from_log = np.exp(log_sum) - 1

    return {
        'compounded_simple': compounded,
        'from_log': from_log
    }

# Test
result = calculate_total_return(simple_returns['AAPL'], log_returns['AAPL'])
print(f"Compounded: {result['compounded_simple']:.2%}")
print(f"From Log: {result['from_log']:.2%}")

Section 2.2: Annualization

Returns and risk are typically quoted on an annual basis for easy comparison.

In this section, you will learn: - How to annualize returns properly - How to annualize volatility - Common pitfalls to avoid

2.2.1 Annualizing Returns

Key insight: Returns compound, so we should compound when annualizing.

Formulas: - Daily to Annual: R_annual = (1 + R_daily)^252 - 1 - Monthly to Annual: R_annual = (1 + R_monthly)^12 - 1 - Weekly to Annual: R_annual = (1 + R_weekly)^52 - 1

# Compare different annualization methods
daily_mean = simple_returns['SPY'].mean()

print('=== Annualizing Daily Returns ===')
print()
print(f'Daily mean return: {daily_mean:.6f}')
print()

# Wrong way: simple multiplication
wrong_annual = daily_mean * 252

# Right way: compounding
right_annual = (1 + daily_mean) ** 252 - 1

print(f'WRONG (multiply by 252):  {wrong_annual:.2%}')
print(f'RIGHT (compound):         {right_annual:.2%}')
print(f'Difference:               {right_annual - wrong_annual:.2%}')
print()
print('For small returns, the difference is small. For larger returns, it matters!')

2.2.2 Annualizing Volatility

Volatility is annualized differently because variance (not std dev) is additive!

Formula: σ_annual = σ_daily × √252

Why square root? - Variance = σ² is additive for independent returns - σ²_annual = σ²_daily × 252 - Taking square root: σ_annual = σ_daily × √252

# Annualize volatility correctly
print('=== Annualized Volatility ===')
print()

for ticker in tickers:
    daily_vol = simple_returns[ticker].std()
    annual_vol = daily_vol * np.sqrt(252)
    print(f'{ticker}: Daily {daily_vol:.4f} → Annual {annual_vol:.2%}')
# Wrong vs right volatility annualization
daily_vol = simple_returns['SPY'].std()

wrong_vol = daily_vol * 252
right_vol = daily_vol * np.sqrt(252)

print('=== Volatility Annualization ===')
print()
print(f'Daily volatility:          {daily_vol:.4f}')
print()
print(f'WRONG (multiply by 252):   {wrong_vol:.2%}  ← Nonsensical!')
print(f'RIGHT (multiply by √252):  {right_vol:.2%}  ← Makes sense')
print()
print('Remember: √252 ≈ 15.87')

Exercise 2.2: Annualize Quarterly Data (Guided)

Your Task: Calculate quarterly returns for AAPL and annualize both return and volatility.

Fill in the blanks:

Exercise
Click to reveal solution
def annualize_quarterly(prices_series: pd.Series) -> dict:
    """
    Calculate and annualize quarterly statistics.
    """
    # Resample to quarterly prices (end of quarter)
    quarterly_prices = prices_series.resample('Q').last()

    # Calculate quarterly returns
    quarterly_returns = quarterly_prices.pct_change().dropna()

    # Calculate quarterly statistics
    q_mean = quarterly_returns.mean()
    q_vol = quarterly_returns.std()

    # Annualize (4 quarters per year)
    annual_return = (1 + q_mean) ** 4 - 1
    annual_vol = q_vol * np.sqrt(4)

    return {
        'quarterly_return': q_mean,
        'quarterly_vol': q_vol,
        'annual_return': annual_return,
        'annual_vol': annual_vol
    }

# Test
result = annualize_quarterly(prices['AAPL'])
print(f"Quarterly Return: {result['quarterly_return']:.2%}")
print(f"Annualized Return: {result['annual_return']:.2%}")
print(f"Quarterly Vol: {result['quarterly_vol']:.2%}")
print(f"Annualized Vol: {result['annual_vol']:.2%}")

Section 2.3: Risk-Adjusted Returns

Raw returns don't tell the whole story. A 20% return with 50% volatility isn't as good as 15% return with 10% volatility!

In this section, you will learn: - Sharpe Ratio: The most popular risk-adjusted metric - Sortino Ratio: Penalizes only downside risk - Calmar Ratio: Uses maximum drawdown as risk

2.3.1 The Sharpe Ratio

Formula: Sharpe = (R_portfolio - R_f) / σ_portfolio

Where: - R_portfolio = Portfolio return - R_f = Risk-free rate (e.g., T-bills) - σ_portfolio = Portfolio volatility

Interpretation: - Sharpe < 1: Subpar risk-adjusted returns - 1 ≤ Sharpe < 2: Good - 2 ≤ Sharpe < 3: Very good - Sharpe ≥ 3: Excellent (rare for long periods)

# Calculate Sharpe Ratio
risk_free_rate = 0.02  # Assume 2% annual risk-free rate

def calculate_sharpe(returns: pd.Series, risk_free_rate: float = 0.02) -> float:
    """Calculate annualized Sharpe ratio."""
    excess_returns = returns - (risk_free_rate / 252)
    return (excess_returns.mean() / excess_returns.std()) * np.sqrt(252)

print('=== Sharpe Ratios (Risk-Free Rate = 2%) ===')
print()

sharpe_ratios = {}
for ticker in tickers:
    sharpe = calculate_sharpe(simple_returns[ticker])
    sharpe_ratios[ticker] = sharpe
    quality = 'Excellent' if sharpe >= 1 else ('Good' if sharpe >= 0.5 else 'Poor')
    print(f'{ticker}: {sharpe:.3f} ({quality})')

2.3.2 The Sortino Ratio

The Sharpe ratio penalizes all volatility equally. But investors mainly dislike downside volatility!

Formula: Sortino = (R_portfolio - R_f) / σ_downside

Where σ_downside only considers returns below a threshold (typically 0 or R_f).

def calculate_sortino(returns: pd.Series, risk_free_rate: float = 0.02, target: float = 0) -> float:
    """Calculate annualized Sortino ratio."""
    excess_returns = returns - (risk_free_rate / 252)
    
    # Downside deviation: only returns below target
    downside_returns = returns[returns < target]
    downside_std = np.sqrt(np.mean(downside_returns**2))
    
    return (excess_returns.mean() / downside_std) * np.sqrt(252)

print('=== Sortino Ratios (vs Sharpe) ===')
print()
print(f'{"Asset":<6} {"Sharpe":>10} {"Sortino":>10}')
print('-' * 28)

for ticker in tickers:
    sharpe = calculate_sharpe(simple_returns[ticker])
    sortino = calculate_sortino(simple_returns[ticker])
    print(f'{ticker:<6} {sharpe:>10.3f} {sortino:>10.3f}')

2.3.3 The Calmar Ratio

Instead of volatility, the Calmar ratio uses maximum drawdown as the risk measure.

Formula: Calmar = Annual Return / Maximum Drawdown

Maximum Drawdown: The largest peak-to-trough decline in portfolio value.

def calculate_max_drawdown(prices: pd.Series) -> float:
    """Calculate maximum drawdown from price series."""
    running_max = prices.cummax()
    drawdown = (prices - running_max) / running_max
    return drawdown.min()

def calculate_calmar(returns: pd.Series, prices: pd.Series) -> float:
    """Calculate Calmar ratio."""
    annual_return = (1 + returns.mean()) ** 252 - 1
    max_dd = calculate_max_drawdown(prices)
    return annual_return / abs(max_dd)

print('=== Maximum Drawdowns & Calmar Ratios ===')
print()
print(f'{"Asset":<6} {"Max Drawdown":>12} {"Annual Return":>14} {"Calmar":>10}')
print('-' * 46)

for ticker in tickers:
    max_dd = calculate_max_drawdown(prices[ticker])
    annual_ret = (1 + simple_returns[ticker].mean()) ** 252 - 1
    calmar = calculate_calmar(simple_returns[ticker], prices[ticker])
    print(f'{ticker:<6} {max_dd:>12.2%} {annual_ret:>14.2%} {calmar:>10.3f}')
# Visualize drawdowns
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Normalize prices to start at 100
normalized = prices / prices.iloc[0] * 100

# Price chart
ax1 = axes[0]
for ticker in tickers:
    ax1.plot(normalized[ticker], label=ticker, linewidth=1.5)
ax1.set_ylabel('Normalized Price (100 = Start)')
ax1.set_title('Asset Prices Over Time')
ax1.legend(loc='upper left')

# Drawdown chart
ax2 = axes[1]
for ticker in tickers:
    running_max = prices[ticker].cummax()
    drawdown = (prices[ticker] - running_max) / running_max
    ax2.fill_between(drawdown.index, 0, drawdown, alpha=0.3, label=ticker)
ax2.set_ylabel('Drawdown')
ax2.set_title('Underwater Chart (Drawdowns)')
ax2.legend(loc='lower left')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))

plt.tight_layout()
plt.show()

Exercise 2.3: Calculate Risk-Adjusted Metrics (Guided)

Your Task: Build a function that calculates all three risk-adjusted ratios for a given asset.

Fill in the blanks:

Exercise
Click to reveal solution
def risk_adjusted_metrics(returns: pd.Series, prices: pd.Series, 
                          risk_free: float = 0.02) -> dict:
    """
    Calculate Sharpe, Sortino, and Calmar ratios.
    """
    # Calculate excess returns
    daily_rf = risk_free / 252
    excess = returns - daily_rf

    # Sharpe = annualized excess return / annualized volatility
    sharpe = (excess.mean() / excess.std()) * np.sqrt(252)

    # Sortino - only downside volatility
    downside = returns[returns < 0]
    downside_std = np.sqrt((downside ** 2).mean())
    sortino = (excess.mean() / downside_std) * np.sqrt(252)

    # Calmar = annual return / |max drawdown|
    annual_ret = (1 + returns.mean()) ** 252 - 1
    running_max = prices.cummax()
    drawdown = (prices - running_max) / running_max
    max_dd = drawdown.min()
    calmar = annual_ret / abs(max_dd)

    return {'sharpe': sharpe, 'sortino': sortino, 'calmar': calmar}

# Test
metrics = risk_adjusted_metrics(simple_returns['AAPL'], prices['AAPL'])
for name, value in metrics.items():
    print(f"{name.capitalize()}: {value:.3f}")

Section 2.4: Benchmark Comparison

How does your strategy compare to a simple benchmark? This section covers Alpha, Beta, and other benchmark-relative metrics.

In this section, you will learn: - Alpha: Excess return vs benchmark - Beta: Sensitivity to benchmark movements - Information Ratio: Risk-adjusted active return

2.4.1 Beta: Market Sensitivity

Beta measures how much an asset moves relative to the market.

Formula: β = Cov(R_asset, R_market) / Var(R_market)

Interpretation: - β = 1: Moves exactly with the market - β > 1: More volatile than market (amplifies movements) - β < 1: Less volatile than market (dampens movements) - β < 0: Moves opposite to market (rare)

# Calculate Beta using SPY as the market
market_returns = simple_returns['SPY']

def calculate_beta(asset_returns: pd.Series, market_returns: pd.Series) -> float:
    """Calculate beta relative to market."""
    covariance = asset_returns.cov(market_returns)
    market_variance = market_returns.var()
    return covariance / market_variance

print('=== Beta Values (vs SPY) ===')
print()

for ticker in tickers:
    if ticker == 'SPY':
        beta = 1.0
    else:
        beta = calculate_beta(simple_returns[ticker], market_returns)
    
    interpretation = 'More risky' if beta > 1 else ('Less risky' if beta < 1 else 'Same as market')
    print(f'{ticker}: {beta:.3f} ({interpretation})')

2.4.2 Alpha: Excess Return

Alpha measures return above what beta predicts.

Formula: α = R_asset - [R_f + β × (R_market - R_f)]

Interpretation: - α > 0: Asset outperformed risk-adjusted expectations - α < 0: Asset underperformed - α = 0: Performance exactly as expected given beta

def calculate_alpha(asset_returns: pd.Series, market_returns: pd.Series, 
                    risk_free_rate: float = 0.02) -> float:
    """Calculate annualized alpha using CAPM."""
    beta = calculate_beta(asset_returns, market_returns)
    
    # Annualized returns
    asset_annual = (1 + asset_returns.mean()) ** 252 - 1
    market_annual = (1 + market_returns.mean()) ** 252 - 1
    
    # Expected return based on CAPM
    expected = risk_free_rate + beta * (market_annual - risk_free_rate)
    
    # Alpha is the difference
    return asset_annual - expected

print('=== Alpha Values (Annualized) ===')
print()

for ticker in tickers:
    beta = calculate_beta(simple_returns[ticker], market_returns)
    alpha = calculate_alpha(simple_returns[ticker], market_returns)
    
    performance = 'Outperformed' if alpha > 0 else 'Underperformed'
    print(f'{ticker}: α = {alpha:>7.2%}, β = {beta:.3f} ({performance})')

2.4.3 Information Ratio

For active managers, we care about: - Active Return: Return difference vs benchmark - Tracking Error: Volatility of active returns - Information Ratio: Active return / Tracking error

def calculate_information_ratio(asset_returns: pd.Series, 
                                 benchmark_returns: pd.Series) -> float:
    """Calculate Information Ratio."""
    active_returns = asset_returns - benchmark_returns
    active_return_annual = active_returns.mean() * 252
    tracking_error = active_returns.std() * np.sqrt(252)
    return active_return_annual / tracking_error

print('=== Benchmark-Relative Metrics (vs SPY) ===')
print()
print(f'{"Asset":<6} {"Active Return":>14} {"Tracking Error":>14} {"Info Ratio":>12}')
print('-' * 50)

for ticker in tickers:
    if ticker == 'SPY':
        continue
        
    active_ret = (simple_returns[ticker].mean() - market_returns.mean()) * 252
    te = (simple_returns[ticker] - market_returns).std() * np.sqrt(252)
    ir = calculate_information_ratio(simple_returns[ticker], market_returns)
    
    print(f'{ticker:<6} {active_ret:>14.2%} {te:>14.2%} {ir:>12.3f}')

Exercise 2.4: Build a Performance Comparison Tool (Open-ended)

Your Task:

Build a function that compares any asset against a benchmark and returns a comprehensive report including: - Alpha and Beta - Information Ratio - Correlation with benchmark - Relative Sharpe (asset Sharpe - benchmark Sharpe)

Your implementation:

Exercise
Click to reveal solution
def compare_to_benchmark(asset_returns: pd.Series, 
                         benchmark_returns: pd.Series,
                         risk_free: float = 0.02) -> dict:
    """
    Comprehensive benchmark comparison.

    Args:
        asset_returns: Daily returns of asset
        benchmark_returns: Daily returns of benchmark
        risk_free: Annual risk-free rate

    Returns:
        Dictionary with all comparison metrics
    """
    # Beta
    cov = asset_returns.cov(benchmark_returns)
    var = benchmark_returns.var()
    beta = cov / var

    # Alpha (annualized)
    asset_annual = (1 + asset_returns.mean()) ** 252 - 1
    bench_annual = (1 + benchmark_returns.mean()) ** 252 - 1
    expected = risk_free + beta * (bench_annual - risk_free)
    alpha = asset_annual - expected

    # Information Ratio
    active = asset_returns - benchmark_returns
    ir = (active.mean() * 252) / (active.std() * np.sqrt(252))

    # Correlation
    correlation = asset_returns.corr(benchmark_returns)

    # Sharpe comparison
    excess_asset = asset_returns - risk_free/252
    excess_bench = benchmark_returns - risk_free/252
    sharpe_asset = (excess_asset.mean() / excess_asset.std()) * np.sqrt(252)
    sharpe_bench = (excess_bench.mean() / excess_bench.std()) * np.sqrt(252)

    return {
        'alpha': alpha,
        'beta': beta,
        'information_ratio': ir,
        'correlation': correlation,
        'asset_sharpe': sharpe_asset,
        'benchmark_sharpe': sharpe_bench,
        'relative_sharpe': sharpe_asset - sharpe_bench
    }

# Test with AAPL vs SPY
comparison = compare_to_benchmark(simple_returns['AAPL'], simple_returns['SPY'])
print("=== AAPL vs SPY Comparison ===")
print(f"Alpha: {comparison['alpha']:.2%}")
print(f"Beta: {comparison['beta']:.3f}")
print(f"Information Ratio: {comparison['information_ratio']:.3f}")
print(f"Correlation: {comparison['correlation']:.3f}")
print(f"Asset Sharpe: {comparison['asset_sharpe']:.3f}")
print(f"Benchmark Sharpe: {comparison['benchmark_sharpe']:.3f}")
print(f"Relative Sharpe: {comparison['relative_sharpe']:.3f}")

Exercise 2.5: Multi-Asset Return Analysis (Open-ended)

Your Task:

Build a function that: - Takes a list of tickers and a date range - Calculates simple and log returns for each - Computes annualized return and volatility - Returns a sorted DataFrame (by Sharpe ratio)

Your implementation:

Exercise
Click to reveal solution
def analyze_multiple_assets(tickers: list, 
                            start: str, 
                            end: str,
                            risk_free: float = 0.02) -> pd.DataFrame:
    """
    Analyze returns for multiple assets.

    Args:
        tickers: List of ticker symbols
        start: Start date string
        end: End date string
        risk_free: Annual risk-free rate

    Returns:
        DataFrame sorted by Sharpe ratio
    """
    # Download data
    data = yf.download(tickers, start=start, end=end, progress=False)

    # Handle column structure
    if isinstance(data.columns, pd.MultiIndex):
        prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
    else:
        prices = data

    results = []
    for ticker in tickers:
        if ticker not in prices.columns:
            continue

        p = prices[ticker].dropna()
        simple_ret = p.pct_change().dropna()
        log_ret = np.log(p / p.shift(1)).dropna()

        # Annualized metrics
        annual_return = (1 + simple_ret.mean()) ** 252 - 1
        annual_vol = simple_ret.std() * np.sqrt(252)

        # Sharpe
        excess = simple_ret - risk_free/252
        sharpe = (excess.mean() / excess.std()) * np.sqrt(252)

        # Total return
        total_return = (p.iloc[-1] / p.iloc[0]) - 1

        results.append({
            'Ticker': ticker,
            'Total Return': total_return,
            'Annual Return': annual_return,
            'Annual Vol': annual_vol,
            'Sharpe': sharpe
        })

    df = pd.DataFrame(results).set_index('Ticker')
    return df.sort_values('Sharpe', ascending=False)

# Test
analysis = analyze_multiple_assets(
    ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META'],
    '2020-01-01', '2024-01-01'
)
print(analysis.to_string(formatters={
    'Total Return': '{:.2%}'.format,
    'Annual Return': '{:.2%}'.format,
    'Annual Vol': '{:.2%}'.format,
    'Sharpe': '{:.3f}'.format
}))

Exercise 2.6: Drawdown Analysis Tool (Open-ended)

Your Task:

Build a comprehensive drawdown analyzer that: - Calculates maximum drawdown and its duration - Finds the top 5 worst drawdown periods - Calculates average time to recovery - Plots the underwater chart

Your implementation:

Exercise
Click to reveal solution
class DrawdownAnalyzer:
    """
    Comprehensive drawdown analysis tool.
    """

    def __init__(self, prices: pd.Series):
        self.prices = prices
        self.running_max = prices.cummax()
        self.drawdown = (prices - self.running_max) / self.running_max

    def max_drawdown(self) -> float:
        """Return maximum drawdown."""
        return self.drawdown.min()

    def max_drawdown_duration(self) -> int:
        """Calculate duration of max drawdown in days."""
        # Find trough date
        trough_date = self.drawdown.idxmin()

        # Find peak before trough
        peak_date = self.prices[:trough_date].idxmax()

        # Find recovery date (if any)
        post_trough = self.prices[trough_date:]
        peak_value = self.prices[peak_date]
        recovery = post_trough[post_trough >= peak_value]

        if len(recovery) > 0:
            recovery_date = recovery.index[0]
            return (recovery_date - peak_date).days
        else:
            return (self.prices.index[-1] - peak_date).days

    def top_drawdowns(self, n: int = 5) -> pd.DataFrame:
        """Find top N drawdown periods."""
        # Simple approach: find local minima
        dd = self.drawdown.copy()
        results = []

        for i in range(n):
            if dd.min() >= 0:
                break
            trough_date = dd.idxmin()
            trough_value = dd.min()

            results.append({
                'Trough Date': trough_date,
                'Drawdown': trough_value
            })

            # Zero out this drawdown period
            mask = (dd.index >= trough_date - pd.Timedelta(days=30)) & \
                   (dd.index <= trough_date + pd.Timedelta(days=30))
            dd[mask] = 0

        return pd.DataFrame(results)

    def plot(self):
        """Plot underwater chart."""
        fig, ax = plt.subplots(figsize=(14, 6))
        ax.fill_between(self.drawdown.index, 0, self.drawdown, 
                        alpha=0.5, color='red')
        ax.set_title('Underwater Chart (Drawdowns)')
        ax.set_ylabel('Drawdown')
        ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
        plt.tight_layout()
        plt.show()

# Test
analyzer = DrawdownAnalyzer(prices['SPY'])
print(f"Max Drawdown: {analyzer.max_drawdown():.2%}")
print(f"Max DD Duration: {analyzer.max_drawdown_duration()} days")
print("\nTop 5 Drawdowns:")
print(analyzer.top_drawdowns())
analyzer.plot()

Module Project: Complete Performance Report

Create a professional performance report for a portfolio.

Your Challenge:

Create an equal-weight portfolio of SPY, AAPL, GLD, and TLT, then: 1. Calculate portfolio returns (use simple returns weighted by 25% each) 2. Compare to SPY as benchmark 3. Report: Annual return, Volatility, Sharpe, Sortino, Max Drawdown, Alpha, Beta

# YOUR CODE HERE - Build the portfolio and create a complete report
Click to reveal solution
# Complete Performance Report Solution

# Create equal-weight portfolio
portfolio_tickers = ['SPY', 'AAPL', 'GLD', 'TLT']
weights = [0.25, 0.25, 0.25, 0.25]

# Calculate portfolio returns
portfolio_returns = sum(w * simple_returns[t] for w, t in zip(weights, portfolio_tickers))

# Create portfolio price series for drawdown calculation
portfolio_prices = (1 + portfolio_returns).cumprod() * 100

print('='*60)
print('PORTFOLIO PERFORMANCE REPORT')
print('Equal-Weight: SPY (25%), AAPL (25%), GLD (25%), TLT (25%)')
print('='*60)

# Calculate all metrics
annual_return = (1 + portfolio_returns.mean()) ** 252 - 1
annual_vol = portfolio_returns.std() * np.sqrt(252)

# Sharpe
risk_free = 0.02
excess = portfolio_returns - risk_free/252
sharpe = (excess.mean() / excess.std()) * np.sqrt(252)

# Sortino
downside = portfolio_returns[portfolio_returns < 0]
downside_std = np.sqrt((downside ** 2).mean())
sortino = (excess.mean() / downside_std) * np.sqrt(252)

# Max Drawdown
running_max = portfolio_prices.cummax()
drawdown = (portfolio_prices - running_max) / running_max
max_dd = drawdown.min()

# Calmar
calmar = annual_return / abs(max_dd)

# Alpha and Beta (vs SPY)
market = simple_returns['SPY']
cov = portfolio_returns.cov(market)
var = market.var()
beta = cov / var

market_annual = (1 + market.mean()) ** 252 - 1
expected = risk_free + beta * (market_annual - risk_free)
alpha = annual_return - expected

# Benchmark metrics
spy_annual = market_annual
spy_vol = market.std() * np.sqrt(252)
spy_excess = market - risk_free/252
spy_sharpe = (spy_excess.mean() / spy_excess.std()) * np.sqrt(252)
spy_max_dd = ((prices['SPY'] - prices['SPY'].cummax()) / prices['SPY'].cummax()).min()

# Print Report
print()
print('--- Return Metrics ---')
print(f'{"Metric":<25} {"Portfolio":>15} {"SPY Benchmark":>15}')
print('-' * 55)
print(f'{"Annual Return":<25} {annual_return:>15.2%} {spy_annual:>15.2%}')
print(f'{"Annual Volatility":<25} {annual_vol:>15.2%} {spy_vol:>15.2%}')

print()
print('--- Risk-Adjusted Metrics ---')
print(f'{"Sharpe Ratio":<25} {sharpe:>15.3f} {spy_sharpe:>15.3f}')
print(f'{"Sortino Ratio":<25} {sortino:>15.3f}')
print(f'{"Calmar Ratio":<25} {calmar:>15.3f}')
print(f'{"Maximum Drawdown":<25} {max_dd:>15.2%} {spy_max_dd:>15.2%}')

print()
print('--- Benchmark-Relative Metrics ---')
print(f'{"Alpha":<25} {alpha:>15.2%}')
print(f'{"Beta":<25} {beta:>15.3f}')

print()
print('='*60)
print('SUMMARY: ', end='')
if sharpe > spy_sharpe and alpha > 0:
    print('Portfolio OUTPERFORMED SPY on risk-adjusted basis!')
else:
    print('Portfolio underperformed SPY on risk-adjusted basis.')

Key Takeaways

What You Learned

1. Types of Returns

  • Simple returns: Intuitive, additive across assets, NOT additive across time
  • Log returns: Additive across time, better statistical properties
  • Use simple for portfolio weights, log for time-series analysis

2. Annualization

  • Returns: Compound using (1 + r)^n - 1
  • Volatility: Multiply by √n (not n!)
  • Use 252 for daily stock data, 12 for monthly, etc.

3. Risk-Adjusted Metrics

  • Sharpe Ratio: Return per unit of total risk (most popular)
  • Sortino Ratio: Return per unit of downside risk
  • Calmar Ratio: Return per unit of maximum drawdown

4. Benchmark Comparison

  • Alpha: Return above what beta predicts (skill measure)
  • Beta: Sensitivity to benchmark movements
  • Information Ratio: Risk-adjusted active return vs benchmark

Formula Reference

Metric Formula
Simple Return R = P₁/P₀ - 1
Log Return r = ln(P₁/P₀)
Annual Return (1 + daily)^252 - 1
Annual Vol daily_vol × √252
Sharpe (R - Rf) / σ
Sortino (R - Rf) / σ_downside
Calmar Annual Return / Max DD
Beta Cov(R, Rm) / Var(Rm)
Alpha R - [Rf + β(Rm - Rf)]

Coming Up Next

In Module 3: Time Series Analysis, we'll explore: - Stationarity and why it matters - Autocorrelation in returns - Moving statistics (rolling, expanding, EWM) - Volatility clustering and modeling


Congratulations on completing Module 2! You now have the tools to properly analyze and compare investment performance.

Module 3: Time Series Analysis

Course 3: Quantitative Finance & Portfolio Theory
Part 1: Statistical Foundations


Learning Objectives

By the end of this module, you will be able to:

  1. Test for stationarity and understand why it matters
  2. Analyze autocorrelation in financial returns
  3. Apply moving statistics (rolling, expanding, exponential)
  4. Understand and model volatility clustering
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Modules 1-2 (Statistics, Return Analysis)

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Financial Data

# Download stock data
tickers = ['SPY', 'AAPL', 'MSFT', 'GLD', 'TLT']
start_date = '2019-01-01'
end_date = '2024-01-01'

print(f'Downloading data for {tickers}...')
data = yf.download(tickers, start=start_date, end=end_date, progress=False)

# Handle different yfinance column structures
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

# Calculate returns
returns = prices.pct_change().dropna()
log_returns = np.log(prices / prices.shift(1)).dropna()

print(f'Data range: {prices.index.min().date()} to {prices.index.max().date()}')
print(f'Trading days: {len(prices)}')
prices.tail()

Section 3.1: Stationarity

Stationarity is a fundamental concept in time series analysis. Most statistical models assume stationarity!

In this section, you will learn: - What stationarity means and why it matters - How to test for stationarity (ADF test) - How to transform non-stationary data

3.1.1 What is Stationarity?

A time series is stationary if its statistical properties don't change over time:

  1. Constant mean: E[X_t] = μ for all t
  2. Constant variance: Var(X_t) = σ² for all t
  3. Covariance depends only on lag: Cov(X_t, X_{t+k}) depends only on k, not t

Why does it matter? - Non-stationary data can lead to spurious correlations - Most forecasting models require stationarity - Statistical inference assumes constant distributions

# Compare prices (non-stationary) vs returns (stationary)
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

spy_prices = prices['SPY']
spy_returns = returns['SPY']

# Prices
ax1 = axes[0, 0]
ax1.plot(spy_prices)
ax1.set_title('SPY Prices (Non-Stationary)')
ax1.set_ylabel('Price')

# Returns
ax2 = axes[0, 1]
ax2.plot(spy_returns)
ax2.axhline(y=0, color='red', linestyle='--', alpha=0.5)
ax2.set_title('SPY Returns (Stationary)')
ax2.set_ylabel('Return')

# Rolling mean of prices
ax3 = axes[1, 0]
rolling_mean_prices = spy_prices.rolling(60).mean()
ax3.plot(spy_prices, alpha=0.5, label='Prices')
ax3.plot(rolling_mean_prices, color='red', lw=2, label='60-day Mean')
ax3.set_title('Prices: Mean Changes Over Time')
ax3.legend()

# Rolling mean of returns
ax4 = axes[1, 1]
rolling_mean_returns = spy_returns.rolling(60).mean()
ax4.plot(spy_returns, alpha=0.5, label='Returns')
ax4.plot(rolling_mean_returns, color='red', lw=2, label='60-day Mean')
ax4.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax4.set_title('Returns: Mean Stays Around Zero')
ax4.legend()

plt.tight_layout()
plt.show()

print('Notice: Prices have a clear trend (non-stationary).')
print('Returns fluctuate around zero with no trend (stationary).')

3.1.2 Testing for Stationarity: ADF Test

The Augmented Dickey-Fuller (ADF) test is the standard test for stationarity.

Hypotheses: - H₀: Series has a unit root (non-stationary) - H₁: Series is stationary

Interpretation: - p-value < 0.05 → Reject H₀ → Series IS stationary - p-value ≥ 0.05 → Cannot reject H₀ → Series is NOT stationary

def test_stationarity(series: pd.Series, name: str = 'Series') -> bool:
    """
    Perform ADF test and print results.
    
    Args:
        series: Time series to test
        name: Name for display
        
    Returns:
        True if stationary, False otherwise
    """
    result = adfuller(series.dropna())
    
    print(f'=== ADF Test: {name} ===')
    print(f'Test Statistic: {result[0]:.4f}')
    print(f'p-value:        {result[1]:.4f}')
    print(f'Critical Values:')
    for key, value in result[4].items():
        print(f'   {key}: {value:.4f}')
    
    if result[1] < 0.05:
        print(f'\nConclusion: {name} IS stationary (p < 0.05)')
        return True
    else:
        print(f'\nConclusion: {name} is NOT stationary (p >= 0.05)')
        return False

# Test prices
print('Testing SPY Prices...')
test_stationarity(prices['SPY'], 'SPY Prices')

print('\n' + '='*50 + '\n')

# Test returns
print('Testing SPY Returns...')
test_stationarity(returns['SPY'], 'SPY Returns')

3.1.3 Making Data Stationary

Common transformations to achieve stationarity:

  1. Differencing: X_t - X_{t-1} (converts prices to returns)
  2. Log transformation: ln(X_t) (stabilizes variance)
  3. Log returns: ln(X_t / X_{t-1}) (combines both)
  4. Detrending: Remove linear or polynomial trend
# Different transformations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

spy = prices['SPY']

# Original prices
ax1 = axes[0, 0]
ax1.plot(spy)
ax1.set_title('Original Prices (Non-Stationary)')

# First difference
ax2 = axes[0, 1]
diff = spy.diff().dropna()
ax2.plot(diff)
ax2.set_title('First Difference (Stationary)')

# Log prices
ax3 = axes[1, 0]
log_prices = np.log(spy)
ax3.plot(log_prices)
ax3.set_title('Log Prices (Still Non-Stationary)')

# Log returns
ax4 = axes[1, 1]
ax4.plot(log_returns['SPY'])
ax4.set_title('Log Returns (Stationary)')

plt.tight_layout()
plt.show()

# Test all transformations
print('=== Stationarity Test Results ===')
print()
transformations = {
    'Original Prices': spy,
    'First Difference': diff,
    'Log Prices': log_prices,
    'Log Returns': log_returns['SPY']
}

for name, series in transformations.items():
    result = adfuller(series.dropna())
    status = 'Stationary' if result[1] < 0.05 else 'Non-Stationary'
    print(f'{name:20} p-value: {result[1]:.4f}{status}')

Exercise 3.1: Test Stationarity (Guided)

Your Task: Build a function that tests stationarity for multiple assets and returns a summary DataFrame.

Fill in the blanks:

Exercise
Click to reveal solution
def stationarity_summary(prices_df: pd.DataFrame, returns_df: pd.DataFrame) -> pd.DataFrame:
    """
    Test stationarity for all assets, both prices and returns.
    """
    results = []

    for ticker in prices_df.columns:
        # Run ADF test on prices
        price_result = adfuller(prices_df[ticker].dropna())
        price_pval = price_result[1]

        # Run ADF test on returns
        return_result = adfuller(returns_df[ticker].dropna())
        return_pval = return_result[1]

        results.append({
            'Ticker': ticker,
            'Prices p-value': price_pval,
            'Prices Stationary': price_pval < 0.05,
            'Returns p-value': return_pval,
            'Returns Stationary': return_pval < 0.05
        })

    return pd.DataFrame(results).set_index('Ticker')

# Test
summary = stationarity_summary(prices, returns)
print(summary)
print("\nKey Finding: All prices are non-stationary, all returns are stationary!")

Section 3.2: Autocorrelation

Autocorrelation measures how today's value relates to past values. It's key for understanding predictability.

In this section, you will learn: - What autocorrelation means - How to compute and visualize ACF/PACF - What autocorrelation patterns tell us about markets

3.2.1 Understanding Autocorrelation

Autocorrelation at lag k: Correlation between X_t and X_{t-k}

$$\rho_k = \frac{Cov(X_t, X_{t-k})}{Var(X_t)}$$

Interpretation: - ρ_k > 0: Positive values tend to follow positive values (momentum) - ρ_k < 0: Positive values tend to follow negative values (mean reversion) - ρ_k ≈ 0: No linear relationship (random/efficient)

# Calculate autocorrelation manually
spy_ret = returns['SPY']

print('=== Manual Autocorrelation Calculation ===')
print()

for lag in range(1, 6):
    # Correlation between returns and lagged returns
    autocorr = spy_ret.corr(spy_ret.shift(lag))
    print(f'Lag {lag}: {autocorr:.4f}')

print()
print('Values close to zero suggest returns are roughly independent.')
print('This is consistent with the Efficient Market Hypothesis!')

3.2.2 ACF and PACF Plots

ACF (Autocorrelation Function): Shows correlation at all lags

PACF (Partial Autocorrelation Function): Shows correlation at lag k after removing effects of lags 1 to k-1

The blue bands represent 95% confidence intervals. Significant autocorrelations extend beyond the bands.

# ACF and PACF for returns
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# SPY Returns ACF
plot_acf(returns['SPY'].dropna(), ax=axes[0, 0], lags=30, alpha=0.05)
axes[0, 0].set_title('SPY Returns - ACF')

# SPY Returns PACF
plot_pacf(returns['SPY'].dropna(), ax=axes[0, 1], lags=30, alpha=0.05)
axes[0, 1].set_title('SPY Returns - PACF')

# SPY Squared Returns ACF (volatility clustering)
plot_acf(returns['SPY'].dropna()**2, ax=axes[1, 0], lags=30, alpha=0.05)
axes[1, 0].set_title('SPY Squared Returns - ACF (Volatility Clustering)')

# SPY Absolute Returns ACF
plot_acf(returns['SPY'].dropna().abs(), ax=axes[1, 1], lags=30, alpha=0.05)
axes[1, 1].set_title('SPY Absolute Returns - ACF')

plt.tight_layout()
plt.show()

print('Returns (top): Little autocorrelation - market is efficient')
print('Squared Returns (bottom): Strong autocorrelation - volatility clusters!')

3.2.3 Testing for Significant Autocorrelation

The Ljung-Box test checks if there's any significant autocorrelation up to lag k.

  • H₀: No autocorrelation (returns are independent)
  • H₁: Significant autocorrelation exists
def ljung_box_test(series: pd.Series, lags: int = 10, name: str = 'Series') -> pd.DataFrame:
    """
    Perform Ljung-Box test for autocorrelation.
    
    Args:
        series: Time series to test
        lags: Number of lags to test
        name: Name for display
        
    Returns:
        Test results DataFrame
    """
    result = acorr_ljungbox(series.dropna(), lags=lags, return_df=True)
    
    print(f'=== Ljung-Box Test: {name} ===')
    print(f'Testing for autocorrelation up to lag {lags}')
    print()
    
    min_pval = result['lb_pvalue'].min()
    significant_lags = (result['lb_pvalue'] < 0.05).sum()
    
    print(f'Minimum p-value: {min_pval:.4f}')
    print(f'Significant lags (p < 0.05): {significant_lags} out of {lags}')
    
    if min_pval < 0.05:
        print('\nConclusion: Significant autocorrelation detected!')
    else:
        print('\nConclusion: No significant autocorrelation (consistent with EMH)')
    
    return result

# Test returns
print('--- Testing SPY Returns ---')
lb_returns = ljung_box_test(returns['SPY'], lags=10, name='SPY Returns')

print('\n' + '='*50 + '\n')

# Test squared returns
print('--- Testing SPY Squared Returns ---')
lb_squared = ljung_box_test(returns['SPY']**2, lags=10, name='SPY Squared Returns')

Exercise 3.2: Autocorrelation Analysis (Guided)

Your Task: Build a function that calculates autocorrelation for both returns and squared returns.

Fill in the blanks:

Exercise
Click to reveal solution
def autocorrelation_analysis(returns_series: pd.Series, max_lag: int = 5) -> pd.DataFrame:
    """
    Calculate autocorrelation for returns and squared returns.
    """
    squared = returns_series ** 2
    results = []

    for lag in range(1, max_lag + 1):
        # Calculate autocorrelation of returns
        ret_acf = returns_series.corr(returns_series.shift(lag))

        # Calculate autocorrelation of squared returns
        sq_acf = squared.corr(squared.shift(lag))

        results.append({
            'Lag': lag,
            'Returns ACF': ret_acf,
            'Squared Returns ACF': sq_acf
        })

    return pd.DataFrame(results).set_index('Lag')

# Test
acf_df = autocorrelation_analysis(returns['AAPL'])
print(acf_df)
print("\nNote: Squared returns have MUCH higher autocorrelation!")

Section 3.3: Moving Statistics

Moving statistics help us analyze trends and patterns that change over time.

In this section, you will learn: - Rolling (moving window) statistics - Expanding (cumulative) statistics - Exponentially weighted statistics

3.3.1 Rolling Statistics

Rolling statistics use a fixed window that moves through time.

Common applications: - Rolling mean (moving average) - Rolling volatility - Rolling correlation - Rolling Sharpe ratio

# Calculate rolling statistics for SPY
spy_ret = returns['SPY']

# Rolling statistics with different windows
windows = [20, 60, 252]  # 1 month, 3 months, 1 year

fig, axes = plt.subplots(3, 1, figsize=(14, 12))

# Rolling Mean
ax1 = axes[0]
ax1.plot(spy_ret, alpha=0.3, label='Daily Returns')
for window in windows:
    rolling_mean = spy_ret.rolling(window).mean()
    ax1.plot(rolling_mean, label=f'{window}-day Mean', linewidth=1.5)
ax1.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax1.set_title('Rolling Mean (Moving Average)')
ax1.legend()

# Rolling Volatility (annualized)
ax2 = axes[1]
for window in windows:
    rolling_vol = spy_ret.rolling(window).std() * np.sqrt(252)
    ax2.plot(rolling_vol, label=f'{window}-day Vol', linewidth=1.5)
ax2.set_title('Rolling Volatility (Annualized)')
ax2.set_ylabel('Volatility')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax2.legend()

# Rolling Sharpe (annualized)
ax3 = axes[2]
for window in windows:
    rolling_sharpe = (spy_ret.rolling(window).mean() / spy_ret.rolling(window).std()) * np.sqrt(252)
    ax3.plot(rolling_sharpe, label=f'{window}-day Sharpe', linewidth=1.5)
ax3.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax3.axhline(y=1, color='green', linestyle='--', alpha=0.3, label='Sharpe = 1')
ax3.set_title('Rolling Sharpe Ratio')
ax3.legend()

plt.tight_layout()
plt.show()

3.3.2 Expanding Statistics

Expanding statistics include all data from the start up to the current point.

Useful for: - Cumulative performance - Building samples for statistical tests - Comparing to "all-time" metrics

# Expanding statistics
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Expanding mean
ax1 = axes[0, 0]
expanding_mean = spy_ret.expanding().mean()
ax1.plot(expanding_mean * 252, label='Expanding Mean (Annualized)')
ax1.axhline(y=spy_ret.mean() * 252, color='red', linestyle='--', label='Final Mean')
ax1.set_title('Expanding Mean Return')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.1%}'))
ax1.legend()

# Expanding volatility
ax2 = axes[0, 1]
expanding_vol = spy_ret.expanding().std() * np.sqrt(252)
ax2.plot(expanding_vol, label='Expanding Vol (Annualized)')
ax2.axhline(y=spy_ret.std() * np.sqrt(252), color='red', linestyle='--', label='Final Vol')
ax2.set_title('Expanding Volatility')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax2.legend()

# Expanding Sharpe
ax3 = axes[1, 0]
expanding_sharpe = (expanding_mean / spy_ret.expanding().std()) * np.sqrt(252)
ax3.plot(expanding_sharpe)
ax3.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax3.set_title('Expanding Sharpe Ratio')

# Cumulative return
ax4 = axes[1, 1]
cumulative_return = (1 + spy_ret).cumprod() - 1
ax4.plot(cumulative_return)
ax4.set_title('Cumulative Return')
ax4.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))

plt.tight_layout()
plt.show()

3.3.3 Exponentially Weighted Statistics

Exponentially weighted statistics give more weight to recent observations.

Key parameter: span (or halflife) - Smaller span = more weight on recent data - Larger span = smoother, more like simple average

# Exponentially weighted statistics
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# EWM Mean vs Rolling Mean
ax1 = axes[0]
ax1.plot(spy_ret, alpha=0.2, label='Daily Returns')
ax1.plot(spy_ret.rolling(20).mean(), label='20-day Rolling Mean', alpha=0.8)
ax1.plot(spy_ret.ewm(span=20).mean(), label='EWM Mean (span=20)', alpha=0.8)
ax1.axhline(y=0, color='black', linestyle='--', alpha=0.3)
ax1.set_title('Rolling Mean vs Exponentially Weighted Mean')
ax1.legend()

# EWM Volatility vs Rolling Volatility
ax2 = axes[1]
rolling_vol_60 = spy_ret.rolling(60).std() * np.sqrt(252)
ewm_vol_60 = spy_ret.ewm(span=60).std() * np.sqrt(252)

ax2.plot(rolling_vol_60, label='60-day Rolling Vol', alpha=0.8)
ax2.plot(ewm_vol_60, label='EWM Vol (span=60)', alpha=0.8)
ax2.set_title('Rolling Volatility vs Exponentially Weighted Volatility')
ax2.set_ylabel('Annualized Volatility')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax2.legend()

plt.tight_layout()
plt.show()

print('EWM reacts faster to recent changes!')

Exercise 3.3: Rolling Correlation (Guided)

Your Task: Build a function that calculates rolling correlation between two assets.

Fill in the blanks:

Exercise
Click to reveal solution
def rolling_correlation(series1: pd.Series, series2: pd.Series, 
                        window: int = 60) -> pd.Series:
    """
    Calculate rolling correlation between two series.
    """
    rolling_corr = series1.rolling(window).corr(series2)
    return rolling_corr

def analyze_rolling_correlation(series1: pd.Series, series2: pd.Series,
                                 name1: str, name2: str,
                                 window: int = 60) -> dict:
    """
    Analyze rolling correlation with summary statistics.
    """
    rolling_corr = rolling_correlation(series1, series2, window)

    return {
        'overall': series1.corr(series2),
        'rolling_mean': rolling_corr.mean(),
        'rolling_min': rolling_corr.min(),
        'rolling_max': rolling_corr.max()
    }

# Test
stats = analyze_rolling_correlation(returns['SPY'], returns['AAPL'], 'SPY', 'AAPL')
for key, value in stats.items():
    print(f"{key}: {value:.3f}")

Section 3.4: Volatility Modeling

Volatility clustering is one of the most important stylized facts in finance. This section introduces volatility modeling.

In this section, you will learn: - What volatility clustering means - Simple volatility forecasting methods - Introduction to GARCH concepts

3.4.1 Visualizing Volatility Clustering

# Visualize volatility clustering
spy_ret = returns['SPY']

fig, axes = plt.subplots(3, 1, figsize=(14, 12))

# Returns
ax1 = axes[0]
ax1.plot(spy_ret)
ax1.axhline(y=0, color='red', linestyle='--', alpha=0.5)
ax1.set_title('SPY Returns')
ax1.set_ylabel('Return')

# Absolute returns (volatility proxy)
ax2 = axes[1]
ax2.plot(spy_ret.abs(), alpha=0.7)
ax2.plot(spy_ret.abs().rolling(20).mean(), color='red', lw=2, label='20-day MA')
ax2.set_title('Absolute Returns (Volatility Proxy)')
ax2.set_ylabel('|Return|')
ax2.legend()

# Squared returns
ax3 = axes[2]
ax3.plot(spy_ret**2, alpha=0.7)
ax3.plot((spy_ret**2).rolling(20).mean(), color='red', lw=2, label='20-day MA')
ax3.set_title('Squared Returns (Variance Proxy)')
ax3.set_ylabel('Return²')
ax3.legend()

plt.tight_layout()
plt.show()

print('Notice the clustering: periods of high volatility are followed by high volatility!')

3.4.2 Simple Volatility Forecasting

Before jumping to complex models, let's try simple approaches:

  1. Historical volatility: Use past realized volatility
  2. EWMA: Exponentially weighted moving average
  3. Simple persistence: Tomorrow's vol = today's vol
# Calculate different volatility estimates
window = 20

# Historical (rolling) volatility
hist_vol = spy_ret.rolling(window).std() * np.sqrt(252)

# EWMA volatility (RiskMetrics style)
ewma_vol = spy_ret.ewm(span=window).std() * np.sqrt(252)

fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(hist_vol, label=f'{window}-day Historical Vol', alpha=0.8)
ax.plot(ewma_vol, label=f'EWMA Vol (span={window})', alpha=0.8)

ax.set_title('Volatility Estimates Comparison')
ax.set_ylabel('Annualized Volatility')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax.legend()

plt.tight_layout()
plt.show()

print('EWMA responds faster to changes due to exponential weighting.')

3.4.3 Introduction to GARCH

GARCH (Generalized Autoregressive Conditional Heteroskedasticity) is the standard model for volatility.

GARCH(1,1) Model:

σ²_t = ω + α × ε²_{t-1} + β × σ²_{t-1}

Where: - ω = long-run variance weight - α = weight on recent shock (yesterday's squared return) - β = weight on recent variance (yesterday's variance)

Interpretation: - High α: Volatility reacts strongly to shocks - High β: Volatility is persistent - α + β close to 1: High persistence

def simple_garch_variance(returns: pd.Series, omega: float = 0.00001, 
                          alpha: float = 0.1, beta: float = 0.85) -> pd.Series:
    """
    Simple GARCH(1,1) variance calculation.
    
    sigma2_t = omega + alpha * epsilon2_{t-1} + beta * sigma2_{t-1}
    
    Args:
        returns: Return series
        omega: Long-run variance weight
        alpha: Weight on recent shock
        beta: Weight on recent variance
        
    Returns:
        Variance series
    """
    n = len(returns)
    sigma2 = np.zeros(n)
    
    # Initialize with sample variance
    sigma2[0] = returns.var()
    
    for t in range(1, n):
        sigma2[t] = omega + alpha * returns.iloc[t-1]**2 + beta * sigma2[t-1]
    
    return pd.Series(sigma2, index=returns.index)

# Calculate GARCH variance
garch_var = simple_garch_variance(spy_ret)
garch_vol = np.sqrt(garch_var) * np.sqrt(252)  # Annualized

# Plot comparison
fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(hist_vol, label='Historical Vol (20-day)', alpha=0.8)
ax.plot(ewma_vol, label='EWMA Vol', alpha=0.8)
ax.plot(garch_vol, label='GARCH-like Vol', alpha=0.8)
ax.set_title('Volatility Model Comparison')
ax.set_ylabel('Annualized Volatility')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax.legend()

plt.tight_layout()
plt.show()

print('GARCH captures volatility clustering and adapts to changing market conditions.')

Exercise 3.4: Volatility Forecasting Comparison (Open-ended)

Your Task:

Build a function that: - Calculates historical and EWMA volatility forecasts - Evaluates forecast accuracy using RMSE - Returns a comparison report

Your implementation:

Exercise
Click to reveal solution
def compare_volatility_forecasts(returns: pd.Series, window: int = 20) -> dict:
    """
    Compare historical and EWMA volatility forecasts.

    Args:
        returns: Return series
        window: Window size for historical vol

    Returns:
        Dictionary with comparison metrics
    """
    # Calculate volatility estimates
    hist_vol = returns.rolling(window).std()
    ewma_vol = returns.ewm(span=window).std()

    # Shift forecasts (use yesterday's estimate to predict today)
    hist_forecast = hist_vol.shift(1)
    ewma_forecast = ewma_vol.shift(1)

    # Realized volatility proxy: absolute return
    realized = returns.abs()

    # Calculate RMSE
    def rmse(forecast, realized):
        diff = (forecast - realized).dropna()
        return np.sqrt((diff ** 2).mean())

    hist_rmse = rmse(hist_forecast, realized)
    ewma_rmse = rmse(ewma_forecast, realized)

    # Calculate correlation with realized
    hist_corr = hist_forecast.corr(realized)
    ewma_corr = ewma_forecast.corr(realized)

    return {
        'hist_rmse': hist_rmse,
        'ewma_rmse': ewma_rmse,
        'hist_corr': hist_corr,
        'ewma_corr': ewma_corr,
        'better_model': 'EWMA' if ewma_rmse < hist_rmse else 'Historical'
    }

# Test
comparison = compare_volatility_forecasts(returns['SPY'])
print("=== Volatility Forecast Comparison ===")
print(f"Historical RMSE: {comparison['hist_rmse']:.6f}")
print(f"EWMA RMSE: {comparison['ewma_rmse']:.6f}")
print(f"Historical Correlation: {comparison['hist_corr']:.4f}")
print(f"EWMA Correlation: {comparison['ewma_corr']:.4f}")
print(f"Better Model: {comparison['better_model']}")

Exercise 3.5: Crisis Detection Tool (Open-ended)

Your Task:

Build a tool that: - Detects high-volatility regimes using rolling volatility - Identifies crisis periods (volatility > 2 standard deviations above mean) - Returns dates and duration of crisis periods

Your implementation:

Exercise
Click to reveal solution
class CrisisDetector:
    """
    Detect high-volatility crisis periods.
    """

    def __init__(self, returns: pd.Series, vol_window: int = 20, threshold_std: float = 2.0):
        self.returns = returns
        self.vol_window = vol_window
        self.threshold_std = threshold_std

        # Calculate rolling volatility
        self.rolling_vol = returns.rolling(vol_window).std() * np.sqrt(252)

        # Calculate threshold
        self.vol_mean = self.rolling_vol.mean()
        self.vol_std = self.rolling_vol.std()
        self.threshold = self.vol_mean + threshold_std * self.vol_std

    def detect_crises(self) -> pd.DataFrame:
        """Identify crisis periods."""
        is_crisis = self.rolling_vol > self.threshold

        # Find crisis start and end dates
        crisis_changes = is_crisis.astype(int).diff()
        starts = crisis_changes[crisis_changes == 1].index
        ends = crisis_changes[crisis_changes == -1].index

        crises = []
        for i, start in enumerate(starts):
            # Find corresponding end
            possible_ends = ends[ends > start]
            if len(possible_ends) > 0:
                end = possible_ends[0]
            else:
                end = self.returns.index[-1]

            duration = (end - start).days
            max_vol = self.rolling_vol[start:end].max()

            crises.append({
                'Start': start,
                'End': end,
                'Duration (days)': duration,
                'Max Vol': max_vol
            })

        return pd.DataFrame(crises)

    def plot(self):
        """Plot volatility with crisis highlighting."""
        fig, ax = plt.subplots(figsize=(14, 6))

        ax.plot(self.rolling_vol, label='Rolling Vol')
        ax.axhline(y=self.threshold, color='red', linestyle='--', 
                   label=f'Crisis Threshold ({self.threshold:.1%})')

        # Highlight crisis periods
        is_crisis = self.rolling_vol > self.threshold
        ax.fill_between(self.rolling_vol.index, 0, self.rolling_vol.max(),
                        where=is_crisis, alpha=0.3, color='red', label='Crisis Period')

        ax.set_title('Volatility with Crisis Detection')
        ax.set_ylabel('Annualized Volatility')
        ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
        ax.legend()

        plt.tight_layout()
        plt.show()

# Test
detector = CrisisDetector(returns['SPY'])
crises = detector.detect_crises()
print("=== Crisis Periods Detected ===")
print(crises)
detector.plot()

Exercise 3.6: Time Series Analysis Dashboard (Open-ended)

Your Task:

Build a comprehensive time series analysis class that: - Tests for stationarity (prices and returns) - Calculates autocorrelation statistics - Computes rolling statistics (mean, vol, Sharpe) - Generates a complete analysis report

Your implementation:

Exercise
Click to reveal solution
class TimeSeriesAnalyzer:
    """
    Comprehensive time series analysis tool.
    """

    def __init__(self, prices: pd.Series, name: str = 'Asset'):
        self.prices = prices
        self.returns = prices.pct_change().dropna()
        self.name = name

    def stationarity_test(self) -> dict:
        """Test stationarity for prices and returns."""
        price_adf = adfuller(self.prices.dropna())
        return_adf = adfuller(self.returns.dropna())

        return {
            'prices_pvalue': price_adf[1],
            'prices_stationary': price_adf[1] < 0.05,
            'returns_pvalue': return_adf[1],
            'returns_stationary': return_adf[1] < 0.05
        }

    def autocorrelation_test(self, lags: int = 10) -> dict:
        """Test autocorrelation in returns and squared returns."""
        lb_returns = acorr_ljungbox(self.returns.dropna(), lags=lags, return_df=True)
        lb_squared = acorr_ljungbox(self.returns.dropna()**2, lags=lags, return_df=True)

        return {
            'returns_min_pvalue': lb_returns['lb_pvalue'].min(),
            'returns_autocorrelated': lb_returns['lb_pvalue'].min() < 0.05,
            'squared_min_pvalue': lb_squared['lb_pvalue'].min(),
            'squared_autocorrelated': lb_squared['lb_pvalue'].min() < 0.05
        }

    def rolling_stats(self, window: int = 60) -> dict:
        """Calculate current rolling statistics."""
        rolling_mean = self.returns.rolling(window).mean().iloc[-1] * 252
        rolling_vol = self.returns.rolling(window).std().iloc[-1] * np.sqrt(252)
        rolling_sharpe = rolling_mean / rolling_vol if rolling_vol > 0 else 0

        return {
            'rolling_mean_annual': rolling_mean,
            'rolling_vol_annual': rolling_vol,
            'rolling_sharpe': rolling_sharpe
        }

    def generate_report(self) -> str:
        """Generate comprehensive analysis report."""
        stat = self.stationarity_test()
        acf = self.autocorrelation_test()
        rolling = self.rolling_stats()

        report = f"""
========================================
TIME SERIES ANALYSIS: {self.name}
========================================

STATIONARITY:
  Prices:  {'Stationary' if stat['prices_stationary'] else 'Non-Stationary'} (p={stat['prices_pvalue']:.4f})
  Returns: {'Stationary' if stat['returns_stationary'] else 'Non-Stationary'} (p={stat['returns_pvalue']:.4f})

AUTOCORRELATION:
  Returns:         {'Yes' if acf['returns_autocorrelated'] else 'No'} (p={acf['returns_min_pvalue']:.4f})
  Squared Returns: {'Yes' if acf['squared_autocorrelated'] else 'No'} (p={acf['squared_min_pvalue']:.4f})

ROLLING STATS (60-day):
  Annualized Return: {rolling['rolling_mean_annual']:.2%}
  Annualized Vol:    {rolling['rolling_vol_annual']:.2%}
  Sharpe Ratio:      {rolling['rolling_sharpe']:.3f}

========================================
        """
        return report

# Test
analyzer = TimeSeriesAnalyzer(prices['MSFT'], 'MSFT')
print(analyzer.generate_report())

Module Project: Complete Time Series Analysis Report

Create a comprehensive time series analysis for MSFT.

Your Challenge:

Analyze MSFT and produce a report that includes: 1. Stationarity test (prices vs returns) 2. Autocorrelation analysis (returns and squared returns) 3. Rolling statistics (60-day mean, volatility, Sharpe) 4. Volatility forecast comparison (Historical vs EWMA)

# YOUR CODE HERE - Create a comprehensive time series analysis
Click to reveal solution
# Complete Time Series Analysis for MSFT

msft_prices = prices['MSFT']
msft_returns = returns['MSFT']

print('='*60)
print('MSFT TIME SERIES ANALYSIS REPORT')
print('='*60)

# 1. Stationarity Tests
print('\n--- 1. STATIONARITY ANALYSIS ---\n')

price_adf = adfuller(msft_prices.dropna())
return_adf = adfuller(msft_returns.dropna())

print(f'Prices ADF p-value:  {price_adf[1]:.4f} ({"Stationary" if price_adf[1] < 0.05 else "Non-Stationary"})')
print(f'Returns ADF p-value: {return_adf[1]:.4f} ({"Stationary" if return_adf[1] < 0.05 else "Non-Stationary"})')

# 2. Autocorrelation Analysis
print('\n--- 2. AUTOCORRELATION ANALYSIS ---\n')

print('Returns Autocorrelation (lags 1-5):')
for lag in range(1, 6):
    acf_val = msft_returns.corr(msft_returns.shift(lag))
    print(f'  Lag {lag}: {acf_val:.4f}')

print('\nSquared Returns Autocorrelation (lags 1-5):')
msft_sq = msft_returns ** 2
for lag in range(1, 6):
    acf_val = msft_sq.corr(msft_sq.shift(lag))
    print(f'  Lag {lag}: {acf_val:.4f}')

# Ljung-Box test
lb_returns = acorr_ljungbox(msft_returns.dropna(), lags=10, return_df=True)
lb_squared = acorr_ljungbox(msft_sq.dropna(), lags=10, return_df=True)

print(f'\nLjung-Box Test (lag 10):')
print(f'  Returns p-value:         {lb_returns["lb_pvalue"].iloc[-1]:.4f}')
print(f'  Squared Returns p-value: {lb_squared["lb_pvalue"].iloc[-1]:.4f}')

# 3. Rolling Statistics
print('\n--- 3. ROLLING STATISTICS (60-day) ---\n')

rolling_mean = msft_returns.rolling(60).mean() * 252
rolling_vol = msft_returns.rolling(60).std() * np.sqrt(252)
rolling_sharpe = rolling_mean / rolling_vol

print(f'Current 60-day Mean Return (Ann): {rolling_mean.iloc[-1]:.2%}')
print(f'Current 60-day Volatility (Ann):  {rolling_vol.iloc[-1]:.2%}')
print(f'Current 60-day Sharpe Ratio:      {rolling_sharpe.iloc[-1]:.3f}')

# 4. Volatility Comparison
print('\n--- 4. VOLATILITY FORECAST COMPARISON ---\n')

hist_vol = msft_returns.rolling(20).std() * np.sqrt(252)
ewma_vol = msft_returns.ewm(span=20).std() * np.sqrt(252)

realized_var = msft_returns ** 2
hist_forecast = (hist_vol.shift(1) / np.sqrt(252)) ** 2
ewma_forecast = (ewma_vol.shift(1) / np.sqrt(252)) ** 2

hist_rmse = np.sqrt(((hist_forecast - realized_var).dropna() ** 2).mean())
ewma_rmse = np.sqrt(((ewma_forecast - realized_var).dropna() ** 2).mean())

print(f'Historical Vol RMSE: {hist_rmse:.6f}')
print(f'EWMA Vol RMSE:       {ewma_rmse:.6f}')
print(f'\nBetter Model: {"EWMA" if ewma_rmse < hist_rmse else "Historical"}')

print('\n' + '='*60)
print('END OF REPORT')
print('='*60)

Key Takeaways

What You Learned

1. Stationarity

  • Prices are non-stationary: They trend over time (random walk)
  • Returns are stationary: They fluctuate around a constant mean
  • Always work with returns for statistical modeling
  • Use the ADF test to formally test stationarity

2. Autocorrelation

  • Returns have little autocorrelation: Consistent with market efficiency
  • Squared returns have strong autocorrelation: Volatility clustering
  • Use ACF/PACF plots to visualize correlation structure
  • Ljung-Box test for formal significance testing

3. Moving Statistics

  • Rolling: Fixed window, good for trends
  • Expanding: Cumulative, good for overall statistics
  • EWM: Weighted toward recent, good for volatility
  • Window size is a bias-variance tradeoff

4. Volatility Modeling

  • Volatility clusters: Big moves follow big moves
  • EWMA reacts faster than simple rolling windows
  • GARCH is the standard model for volatility
  • Volatility IS predictable (unlike returns)

Key Formulas

Concept Formula/Test
Stationarity ADF test (p < 0.05 = stationary)
Autocorrelation ρ_k = Cov(X_t, X_{t-k}) / Var(X_t)
Rolling Mean mean over [t-w, t] window
Rolling Vol σ_t = std([t-w, t]) × √252
EWMA Vol σ²_t = λσ²_{t-1} + (1-λ)r²_{t-1}
GARCH(1,1) σ²_t = ω + αε²_{t-1} + βσ²_{t-1}

Coming Up Next

In Part 2: Portfolio Theory, we'll learn: - Modern Portfolio Theory (Markowitz) - Mean-Variance Optimization - Efficient Frontier - Capital Asset Pricing Model (CAPM)


Congratulations on completing Part 1: Statistical Foundations! You now have the statistical toolkit for quantitative finance.

Module 4: Modern Portfolio Theory

Course 3: Quantitative Finance
Part 2: Portfolio Theory


Learning Objectives

By the end of this module, you will be able to:

  1. Calculate portfolio returns and risk using matrix operations
  2. Understand and quantify the diversification effect
  3. Analyze the risk-return tradeoff
  4. Build and analyze two-asset portfolios
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 3: Time Series Analysis

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Data

# Download data for a diverse portfolio
tickers = ['AAPL', 'MSFT', 'JNJ', 'XOM', 'GLD']  # Tech, Healthcare, Energy, Gold
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)

print("Downloading portfolio data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False)

# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()

# Calculate annualized statistics
annual_returns = returns.mean() * 252
annual_volatility = returns.std() * np.sqrt(252)
cov_matrix = returns.cov() * 252
n_assets = len(tickers)

print(f"\nData loaded: {len(prices)} trading days")
print(f"Assets: {list(prices.columns)}")

Section 4.1: Portfolio Returns & Risk

In 1952, Harry Markowitz revolutionized finance with Modern Portfolio Theory (MPT). The key insight: don't put all your eggs in one basket.

In this section, you will learn: - How to calculate portfolio returns as weighted averages - Why portfolio risk is NOT a weighted average - The role of the covariance matrix

4.1.1 Portfolio Return Formula

The portfolio return is the weighted average of individual asset returns:

$$R_p = \sum_{i=1}^{n} w_i R_i = w_1 R_1 + w_2 R_2 + ... + w_n R_n$$

Where: - $R_p$ = Portfolio return - $w_i$ = Weight of asset $i$ - $R_i$ = Return of asset $i$ - $\sum w_i = 1$ (weights sum to 100%)

# Display individual asset statistics
print("Individual Asset Statistics (Annualized)")
print("=" * 50)
stats_df = pd.DataFrame({
    'Expected Return': annual_returns,
    'Volatility': annual_volatility,
    'Sharpe (rf=0)': annual_returns / annual_volatility
})
print(stats_df.round(4))
# Create an equal-weighted portfolio
equal_weights = np.array([1/n_assets] * n_assets)

# Portfolio expected return (weighted average)
portfolio_return = np.dot(equal_weights, annual_returns)
print(f"Equal-weighted portfolio: {dict(zip(tickers, equal_weights.round(4)))}")
print(f"\nPortfolio Expected Return: {portfolio_return:.4f} ({portfolio_return*100:.2f}%)")

# Naive expectation of volatility (weighted average - WRONG!)
naive_volatility = np.dot(equal_weights, annual_volatility)
print(f"Naive Portfolio Volatility (weighted avg): {naive_volatility:.4f} ({naive_volatility*100:.2f}%)")

4.1.2 Portfolio Risk Formula

Portfolio risk is NOT a simple weighted average! It depends on how assets move together (covariance):

$$\sigma_p^2 = \sum_{i=1}^{n} \sum_{j=1}^{n} w_i w_j \sigma_{ij}$$

Or in matrix notation:

$$\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}$$

Where: - $\sigma_p^2$ = Portfolio variance - $\Sigma$ = Covariance matrix - $\mathbf{w}$ = Weight vector

# Display the covariance matrix
print("Annualized Covariance Matrix")
print("=" * 60)
print(cov_matrix.round(4))
# Calculate TRUE portfolio volatility using matrix math
portfolio_variance = np.dot(equal_weights.T, np.dot(cov_matrix, equal_weights))
portfolio_volatility = np.sqrt(portfolio_variance)

print("Portfolio Risk Calculation")
print("=" * 50)
print(f"True Portfolio Volatility: {portfolio_volatility:.4f} ({portfolio_volatility*100:.2f}%)")
print(f"Naive (weighted avg):      {naive_volatility:.4f} ({naive_volatility*100:.2f}%)")
print(f"\nRisk Reduction from Diversification: {(1 - portfolio_volatility/naive_volatility)*100:.2f}%")

Exercise 4.1: Calculate Custom Portfolio Statistics (Guided)

Your Task: Create a custom portfolio with weights [30%, 25%, 20%, 15%, 10%] and calculate its return and volatility.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def calculate_portfolio_stats(weights: np.ndarray, 
                              expected_returns: pd.Series, 
                              cov_matrix: pd.DataFrame) -> dict:
    """Calculate portfolio return and volatility."""
    port_return = np.dot(weights, expected_returns)
    port_variance = np.dot(weights.T, np.dot(cov_matrix, weights))
    port_volatility = np.sqrt(port_variance)

    return {
        'return': port_return,
        'volatility': port_volatility,
        'sharpe': port_return / port_volatility
    }

custom_weights = np.array([0.30, 0.25, 0.20, 0.15, 0.10])
stats = calculate_portfolio_stats(custom_weights, annual_returns, cov_matrix)
print(f"Return: {stats['return']*100:.2f}%, Volatility: {stats['volatility']*100:.2f}%")

Section 4.2: Diversification

Diversification is the "only free lunch in finance" - you can reduce risk without reducing expected return.

In this section, you will learn: - The role of correlation in diversification - How risk decreases as we add more assets - The difference between systematic and idiosyncratic risk

4.2.1 The Role of Correlation

Diversification benefits depend on correlation between assets:

Correlation Diversification Benefit
ρ = +1 None (assets move perfectly together)
ρ = 0 Good (assets move independently)
ρ = -1 Perfect (can eliminate all risk!)
# Calculate and visualize correlation matrix
corr_matrix = returns.corr()

fig, ax = plt.subplots(figsize=(8, 6))
im = ax.imshow(corr_matrix, cmap='RdYlGn', vmin=-1, vmax=1)
ax.set_xticks(range(len(tickers)))
ax.set_yticks(range(len(tickers)))
ax.set_xticklabels(tickers)
ax.set_yticklabels(tickers)

for i in range(len(tickers)):
    for j in range(len(tickers)):
        ax.text(j, i, f'{corr_matrix.iloc[i, j]:.2f}',
                ha='center', va='center', fontsize=12)

plt.colorbar(im, label='Correlation')
plt.title('Asset Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

4.2.2 Diversification by Number of Assets

# Simulate diversification effect
np.random.seed(42)
n_simulations = 1000
asset_counts = range(1, n_assets + 1)
avg_volatilities = []

for n in asset_counts:
    volatilities = []
    for _ in range(n_simulations):
        selected = np.random.choice(n_assets, n, replace=False)
        w = np.zeros(n_assets)
        w[selected] = 1/n
        vol = np.sqrt(np.dot(w.T, np.dot(cov_matrix, w)))
        volatilities.append(vol)
    avg_volatilities.append(np.mean(volatilities))

plt.figure(figsize=(10, 6))
plt.plot(asset_counts, avg_volatilities, 'bo-', linewidth=2, markersize=10)
plt.axhline(y=avg_volatilities[-1], color='g', linestyle='--', alpha=0.7, 
            label=f'Fully diversified: {avg_volatilities[-1]*100:.1f}%')
plt.axhline(y=annual_volatility.mean(), color='r', linestyle='--', alpha=0.7,
            label=f'Avg single asset: {annual_volatility.mean()*100:.1f}%')
plt.xlabel('Number of Assets', fontsize=12)
plt.ylabel('Average Portfolio Volatility', fontsize=12)
plt.title('Diversification Effect', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

4.2.3 Systematic vs. Idiosyncratic Risk

Diversification can only eliminate idiosyncratic (unsystematic) risk—the risk specific to individual assets.

Systematic risk (market risk) cannot be diversified away because it affects all assets.

$$\text{Total Risk} = \text{Systematic Risk} + \text{Idiosyncratic Risk}$$


Exercise 4.2: Find Best Diversification Pairs (Guided)

Your Task: Loop through all asset pairs and find the 50/50 portfolio with the lowest volatility.

Fill in the blanks:

Exercise
Click to reveal solution
def find_best_pair(returns: pd.DataFrame, tickers: list) -> tuple:
    """Find the two-asset 50/50 portfolio with lowest volatility."""
    best_pair = None
    best_volatility = float('inf')

    for i in range(len(tickers)):
        for j in range(i+1, len(tickers)):
            pair_returns = returns[[tickers[i], tickers[j]]]
            pair_cov = pair_returns.cov() * 252
            weights = np.array([0.5, 0.5])
            port_vol = np.sqrt(np.dot(weights.T, np.dot(pair_cov, weights)))

            if port_vol < best_volatility:
                best_volatility = port_vol
                best_pair = (tickers[i], tickers[j])

    return best_pair, best_volatility

pair, vol = find_best_pair(returns, tickers)
print(f"Best pair: {pair}, Volatility: {vol*100:.2f}%")

Exercise 4.3: Correlation Impact Analysis (Open-ended)

Your Task:

Build a function that: - Takes two assets and simulates different correlation values (-1 to +1) - Calculates the minimum achievable portfolio volatility for each correlation - Returns a DataFrame showing the relationship between correlation and minimum risk

Your implementation:

Exercise
Click to reveal solution
def correlation_impact_analysis(vol_a: float, vol_b: float,
                                ret_a: float, ret_b: float) -> pd.DataFrame:
    """Analyze how correlation affects minimum portfolio risk."""
    correlations = np.linspace(-1, 1, 21)
    results = []

    for corr in correlations:
        # Analytical minimum variance weight for asset A
        numerator = vol_b**2 - vol_a * vol_b * corr
        denominator = vol_a**2 + vol_b**2 - 2 * vol_a * vol_b * corr

        if denominator > 0:
            w_a = numerator / denominator
            w_a = max(0, min(1, w_a))  # Bound to [0, 1]
        else:
            w_a = 0.5

        w_b = 1 - w_a

        # Calculate portfolio volatility
        var = (w_a**2 * vol_a**2 + w_b**2 * vol_b**2 + 
               2 * w_a * w_b * vol_a * vol_b * corr)
        min_vol = np.sqrt(max(var, 0))

        results.append({
            'Correlation': corr,
            'Weight_A': w_a,
            'Min_Volatility': min_vol,
            'Risk_Reduction': (1 - min_vol / ((vol_a + vol_b) / 2)) * 100
        })

    return pd.DataFrame(results)

# Test with AAPL and GLD
analysis = correlation_impact_analysis(
    annual_volatility['AAPL'], annual_volatility['GLD'],
    annual_returns['AAPL'], annual_returns['GLD']
)
print(analysis.to_string(index=False))

Section 4.3: The Risk-Return Tradeoff

In finance, there's a fundamental relationship: higher expected returns require taking more risk.

In this section, you will learn: - How to visualize the risk-return space - The concept of dominated portfolios - Finding optimal portfolios through random sampling

4.3.1 Visualizing Risk-Return Space

# Generate random portfolios
np.random.seed(42)
n_portfolios = 5000

portfolio_returns_list = []
portfolio_volatilities_list = []
portfolio_sharpes_list = []
portfolio_weights_list = []

for _ in range(n_portfolios):
    weights = np.random.random(n_assets)
    weights = weights / weights.sum()
    
    ret = np.dot(weights, annual_returns)
    vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    sharpe = ret / vol
    
    portfolio_returns_list.append(ret)
    portfolio_volatilities_list.append(vol)
    portfolio_sharpes_list.append(sharpe)
    portfolio_weights_list.append(weights)
# Plot risk-return space
plt.figure(figsize=(12, 8))

scatter = plt.scatter(portfolio_volatilities_list, portfolio_returns_list,
                     c=portfolio_sharpes_list, cmap='viridis', alpha=0.5, s=10)
plt.colorbar(scatter, label='Sharpe Ratio')

# Individual assets
for ticker in tickers:
    plt.scatter(annual_volatility[ticker], annual_returns[ticker],
               s=200, marker='*', edgecolors='black', linewidth=2, zorder=5)
    plt.annotate(ticker, (annual_volatility[ticker], annual_returns[ticker]),
                xytext=(10, 5), textcoords='offset points', fontsize=12, fontweight='bold')

# Equal-weighted portfolio
plt.scatter(portfolio_volatility, portfolio_return,
           s=300, marker='D', c='red', edgecolors='black', linewidth=2,
           label='Equal-Weighted', zorder=5)

plt.xlabel('Volatility (Risk)', fontsize=12)
plt.ylabel('Expected Return', fontsize=12)
plt.title('Risk-Return Space: Individual Assets and Random Portfolios', fontsize=14, fontweight='bold')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Find best portfolios from random sampling
max_sharpe_idx = np.argmax(portfolio_sharpes_list)
min_vol_idx = np.argmin(portfolio_volatilities_list)

print("Notable Portfolios from Random Sampling")
print("=" * 60)

print("\nMaximum Sharpe Ratio Portfolio:")
print(f"  Return: {portfolio_returns_list[max_sharpe_idx]*100:.2f}%")
print(f"  Volatility: {portfolio_volatilities_list[max_sharpe_idx]*100:.2f}%")
print(f"  Sharpe: {portfolio_sharpes_list[max_sharpe_idx]:.4f}")

print("\nMinimum Volatility Portfolio:")
print(f"  Return: {portfolio_returns_list[min_vol_idx]*100:.2f}%")
print(f"  Volatility: {portfolio_volatilities_list[min_vol_idx]*100:.2f}%")
print(f"  Sharpe: {portfolio_sharpes_list[min_vol_idx]:.4f}")

Exercise 4.4: Risk Contribution Analysis (Guided)

Your Task: Calculate the marginal and percentage risk contribution of each asset in a portfolio.

Fill in the blanks:

Exercise
Click to reveal solution
def calculate_risk_contribution(weights: np.ndarray, 
                                cov_matrix: pd.DataFrame) -> pd.DataFrame:
    """Calculate risk contribution of each asset."""
    port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
    mcr = np.dot(cov_matrix, weights) / port_vol
    ccr = weights * mcr
    pcr = ccr / port_vol * 100

    return pd.DataFrame({
        'Weight': weights,
        'Marginal_Risk': mcr,
        'Component_Risk': ccr,
        'Pct_of_Risk': pcr
    }, index=cov_matrix.columns)

risk_contrib = calculate_risk_contribution(equal_weights, cov_matrix)
print(risk_contrib)

Section 4.4: Two-Asset Portfolio Analysis

Before tackling complex multi-asset portfolios, let's build intuition with just two assets.

In this section, you will learn: - The two-asset portfolio formulas - How to find the minimum variance portfolio analytically - The effect of correlation on the portfolio frontier

4.4.1 Two-Asset Portfolio Formulas

For a portfolio of assets A and B:

Return: $R_p = w_A R_A + (1-w_A) R_B$

Variance: $\sigma_p^2 = w_A^2 \sigma_A^2 + w_B^2 \sigma_B^2 + 2 w_A w_B \sigma_A \sigma_B \rho_{AB}$

Minimum Variance Weight: $$w_A^* = \frac{\sigma_B^2 - \sigma_A \sigma_B \rho_{AB}}{\sigma_A^2 + \sigma_B^2 - 2\sigma_A \sigma_B \rho_{AB}}$$

# Two-asset analysis: AAPL vs GLD
asset_a, asset_b = 'AAPL', 'GLD'
ret_a = annual_returns[asset_a]
ret_b = annual_returns[asset_b]
vol_a = annual_volatility[asset_a]
vol_b = annual_volatility[asset_b]
corr_ab = corr_matrix.loc[asset_a, asset_b]

print(f"Two-Asset Analysis: {asset_a} vs {asset_b}")
print("=" * 50)
print(f"\n{asset_a}: Return={ret_a*100:.2f}%, Vol={vol_a*100:.2f}%")
print(f"{asset_b}: Return={ret_b*100:.2f}%, Vol={vol_b*100:.2f}%")
print(f"Correlation: {corr_ab:.4f}")
# Generate portfolios across weight combinations
weights_a = np.linspace(0, 1, 101)
port_returns_2asset = []
port_vols_2asset = []

for w_a in weights_a:
    w_b = 1 - w_a
    ret = w_a * ret_a + w_b * ret_b
    var = (w_a**2 * vol_a**2 + w_b**2 * vol_b**2 + 
           2 * w_a * w_b * vol_a * vol_b * corr_ab)
    port_returns_2asset.append(ret)
    port_vols_2asset.append(np.sqrt(var))

# Find minimum variance
min_var_idx = np.argmin(port_vols_2asset)
# Plot two-asset frontier
plt.figure(figsize=(12, 8))
plt.plot(port_vols_2asset, port_returns_2asset, 'b-', linewidth=3, label='Portfolio Combinations')

plt.scatter(vol_a, ret_a, s=300, marker='*', c='red', edgecolors='black', 
           linewidth=2, zorder=5, label=asset_a)
plt.scatter(vol_b, ret_b, s=300, marker='*', c='gold', edgecolors='black', 
           linewidth=2, zorder=5, label=asset_b)
plt.scatter(port_vols_2asset[min_var_idx], port_returns_2asset[min_var_idx],
           s=200, marker='D', c='green', edgecolors='black', linewidth=2, 
           zorder=5, label=f'Min Var ({weights_a[min_var_idx]*100:.0f}% {asset_a})')

plt.xlabel('Volatility (Risk)', fontsize=12)
plt.ylabel('Expected Return', fontsize=12)
plt.title(f'Two-Asset Portfolio: {asset_a} and {asset_b}\n(Correlation: {corr_ab:.3f})', 
          fontsize=14, fontweight='bold')
plt.legend(loc='best')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Exercise 4.5: Analytical Minimum Variance (Open-ended)

Your Task:

Build a function that: - Takes two asset volatilities and their correlation - Calculates the optimal weight for minimum variance using the analytical formula - Returns the optimal weights and the resulting portfolio volatility

Your implementation:

Exercise
Click to reveal solution
def analytical_min_variance(vol_a: float, vol_b: float, 
                            correlation: float) -> dict:
    """
    Calculate minimum variance portfolio weights analytically.

    Args:
        vol_a: Volatility of asset A
        vol_b: Volatility of asset B
        correlation: Correlation between A and B

    Returns:
        Dictionary with optimal weights and portfolio volatility
    """
    numerator = vol_b**2 - vol_a * vol_b * correlation
    denominator = vol_a**2 + vol_b**2 - 2 * vol_a * vol_b * correlation

    if denominator == 0:
        w_a = 0.5
    else:
        w_a = numerator / denominator

    w_b = 1 - w_a

    # Calculate portfolio volatility
    port_var = (w_a**2 * vol_a**2 + w_b**2 * vol_b**2 + 
                2 * w_a * w_b * vol_a * vol_b * correlation)
    port_vol = np.sqrt(max(port_var, 0))

    return {
        'weight_a': w_a,
        'weight_b': w_b,
        'portfolio_volatility': port_vol
    }

# Test
result = analytical_min_variance(vol_a, vol_b, corr_ab)
print(f"Optimal weight A: {result['weight_a']*100:.2f}%")
print(f"Optimal weight B: {result['weight_b']*100:.2f}%")
print(f"Min portfolio vol: {result['portfolio_volatility']*100:.2f}%")

Exercise 4.6: Complete Portfolio Analyzer (Open-ended)

Your Task:

Build a PortfolioAnalyzer class that: - Takes a list of tickers and downloads data - Calculates all individual asset statistics - Computes correlation and covariance matrices - Generates random portfolios and finds the best ones - Provides a summary method that displays all key metrics

Your implementation:

Exercise
Click to reveal solution
class PortfolioAnalyzer:
    """Comprehensive portfolio analysis tool."""

    def __init__(self, tickers: list, years: int = 5):
        self.tickers = tickers
        self.n_assets = len(tickers)
        self._load_data(years)
        self._calculate_statistics()

    def _load_data(self, years: int):
        """Download and prepare price data."""
        end = datetime.now()
        start = end - timedelta(days=years*365)

        data = yf.download(self.tickers, start=start, end=end, progress=False)

        if isinstance(data.columns, pd.MultiIndex):
            self.prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
        else:
            self.prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

        self.prices.columns = [str(c) for c in self.prices.columns]
        self.returns = self.prices.pct_change().dropna()

    def _calculate_statistics(self):
        """Calculate all statistics."""
        self.annual_returns = self.returns.mean() * 252
        self.annual_volatility = self.returns.std() * np.sqrt(252)
        self.cov_matrix = self.returns.cov() * 252
        self.corr_matrix = self.returns.corr()

    def portfolio_stats(self, weights: np.ndarray) -> dict:
        """Calculate portfolio statistics."""
        ret = np.dot(weights, self.annual_returns)
        vol = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
        return {'return': ret, 'volatility': vol, 'sharpe': ret / vol}

    def find_optimal_portfolios(self, n_samples: int = 5000) -> dict:
        """Find optimal portfolios through random sampling."""
        np.random.seed(42)

        best_sharpe = {'sharpe': -np.inf}
        min_vol = {'volatility': np.inf}

        for _ in range(n_samples):
            w = np.random.random(self.n_assets)
            w = w / w.sum()
            stats = self.portfolio_stats(w)
            stats['weights'] = w

            if stats['sharpe'] > best_sharpe['sharpe']:
                best_sharpe = stats.copy()
            if stats['volatility'] < min_vol['volatility']:
                min_vol = stats.copy()

        return {'max_sharpe': best_sharpe, 'min_volatility': min_vol}

    def summary(self):
        """Display comprehensive summary."""
        print("=" * 60)
        print("PORTFOLIO ANALYZER SUMMARY")
        print("=" * 60)

        print("\nAsset Statistics:")
        stats_df = pd.DataFrame({
            'Return': self.annual_returns,
            'Volatility': self.annual_volatility,
            'Sharpe': self.annual_returns / self.annual_volatility
        })
        print(stats_df.round(4))

        optimal = self.find_optimal_portfolios()
        print("\nOptimal Portfolios:")
        for name, stats in optimal.items():
            print(f"\n{name}:")
            print(f"  Return: {stats['return']*100:.2f}%")
            print(f"  Volatility: {stats['volatility']*100:.2f}%")
            print(f"  Sharpe: {stats['sharpe']:.4f}")

# Test
analyzer = PortfolioAnalyzer(['AAPL', 'MSFT', 'JNJ', 'GLD'])
analyzer.summary()

Module Project: Build Your Own Diversified Portfolio

Apply everything you've learned to construct and analyze a diversified portfolio.

Your Challenge:

Build a complete portfolio analysis that: 1. Creates a custom portfolio with your chosen weights 2. Calculates all risk and return metrics 3. Compares to individual assets and equal-weighted benchmark 4. Analyzes risk contribution by asset 5. Visualizes the results

# YOUR CODE HERE - Module Project
Click to reveal solution
# Complete Portfolio Analysis Project

# Step 1: Define custom portfolio
my_weights = np.array([0.35, 0.30, 0.15, 0.10, 0.10])
print("My Portfolio Allocation")
print("=" * 40)
for ticker, weight in zip(tickers, my_weights):
    print(f"  {ticker}: {weight*100:.1f}%")

# Step 2: Calculate statistics
my_return = np.dot(my_weights, annual_returns)
my_variance = np.dot(my_weights.T, np.dot(cov_matrix, my_weights))
my_volatility = np.sqrt(my_variance)
my_sharpe = my_return / my_volatility

print(f"\nPortfolio Statistics")
print(f"  Return: {my_return*100:.2f}%")
print(f"  Volatility: {my_volatility*100:.2f}%")
print(f"  Sharpe: {my_sharpe:.4f}")

# Step 3: Compare to benchmarks
comparison = pd.DataFrame({
    'My Portfolio': [my_return, my_volatility, my_sharpe],
    'Equal-Weighted': [portfolio_return, portfolio_volatility, portfolio_return/portfolio_volatility]
}, index=['Return', 'Volatility', 'Sharpe'])

for ticker in tickers:
    comparison[ticker] = [annual_returns[ticker], 
                         annual_volatility[ticker], 
                         annual_returns[ticker]/annual_volatility[ticker]]

print("\nComparison")
print(comparison.round(4).T)

# Step 4: Risk contribution
mcr = np.dot(cov_matrix, my_weights) / my_volatility
ccr = my_weights * mcr
pcr = ccr / my_volatility * 100

print("\nRisk Contribution")
risk_df = pd.DataFrame({
    'Weight': my_weights,
    'Pct_Risk': pcr
}, index=tickers)
print(risk_df.round(2))

# Step 5: Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].pie(my_weights, labels=tickers, autopct='%1.1f%%')
axes[0].set_title('Weight Allocation')
axes[1].pie(pcr, labels=tickers, autopct='%1.1f%%')
axes[1].set_title('Risk Contribution')
plt.tight_layout()
plt.show()

Key Takeaways

What You Learned

1. Portfolio Returns

  • Portfolio return is the weighted average of asset returns
  • Formula: $R_p = \sum w_i R_i$

2. Portfolio Risk

  • Portfolio risk is NOT a weighted average
  • Depends on covariances between assets
  • Formula: $\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}$

3. Diversification

  • Lower correlation = better diversification
  • Can only eliminate idiosyncratic risk, not systematic risk
  • The "only free lunch" in finance

4. Two-Asset Portfolios

  • Analytical solutions exist for minimum variance
  • Correlation determines the shape of the frontier

Key Formulas

Metric Formula
Portfolio Return $R_p = \sum w_i R_i$
Portfolio Variance $\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}$
Min Variance Weight $w_A^* = \frac{\sigma_B^2 - \sigma_A\sigma_B\rho}{\sigma_A^2 + \sigma_B^2 - 2\sigma_A\sigma_B\rho}$

Coming Up Next

In Module 5: Mean-Variance Optimization, we'll learn how to find the optimal portfolio weights using mathematical optimization.


Congratulations on completing Module 4!

Module 5: Mean-Variance Optimization

Course 3: Quantitative Finance
Part 2: Portfolio Theory


Learning Objectives

By the end of this module, you will be able to:

  1. Formulate portfolio optimization as a mathematical problem
  2. Implement optimization using scipy
  3. Apply realistic constraints (long-only, position limits)
  4. Find optimal portfolios for different objectives
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 4: Modern Portfolio Theory

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy.optimize import minimize
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Data

# Download data for portfolio optimization
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'JNJ', 'JPM', 'XOM', 'GLD']
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)

print("Downloading portfolio data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False)

# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.xs('Close', axis=1, level=1) if 'Close' in data.columns.get_level_values(1) else data.iloc[:, :len(tickers)]
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()

# Calculate annualized statistics
annual_returns = returns.mean() * 252
annual_volatility = returns.std() * np.sqrt(252)
cov_matrix = returns.cov() * 252
n_assets = len(tickers)

print(f"\nData loaded: {len(prices)} trading days")
print(f"Assets: {list(prices.columns)}")
# Display asset statistics
stats_df = pd.DataFrame({
    'Expected Return': annual_returns,
    'Volatility': annual_volatility,
    'Sharpe (rf=0)': annual_returns / annual_volatility
}).sort_values('Sharpe (rf=0)', ascending=False)

print("Individual Asset Statistics (Annualized)")
print("=" * 55)
print(stats_df.round(4))

Section 5.1: The Optimization Problem

Mean-Variance Optimization (MVO) is the mathematical framework that earned Harry Markowitz the Nobel Prize.

In this section, you will learn: - How to formulate portfolio optimization mathematically - The minimum variance and maximum Sharpe objectives - Core portfolio metric functions

5.1.1 Mathematical Formulation

Minimize Portfolio Variance:

$$\min_{\mathbf{w}} \quad \mathbf{w}^T \Sigma \mathbf{w}$$

Subject to:

$$\mathbf{w}^T \mathbf{\mu} = R_{target} \quad \text{(target return)}$$ $$\mathbf{w}^T \mathbf{1} = 1 \quad \text{(weights sum to 1)}$$

Where: - $\mathbf{w}$ = vector of portfolio weights - $\Sigma$ = covariance matrix - $\mathbf{\mu}$ = vector of expected returns

# Define core portfolio functions
def portfolio_return(weights: np.ndarray, returns: pd.Series) -> float:
    """Calculate portfolio expected return."""
    return np.dot(weights, returns)

def portfolio_volatility(weights: np.ndarray, cov_matrix: pd.DataFrame) -> float:
    """Calculate portfolio volatility."""
    return np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))

def portfolio_sharpe(weights: np.ndarray, returns: pd.Series, 
                     cov_matrix: pd.DataFrame, rf: float = 0) -> float:
    """Calculate portfolio Sharpe ratio."""
    ret = portfolio_return(weights, returns)
    vol = portfolio_volatility(weights, cov_matrix)
    return (ret - rf) / vol

print("Portfolio functions defined")
# Test with equal weights
equal_weights = np.array([1/n_assets] * n_assets)

print("Equal-Weighted Portfolio Test")
print("=" * 40)
print(f"Return: {portfolio_return(equal_weights, annual_returns)*100:.2f}%")
print(f"Volatility: {portfolio_volatility(equal_weights, cov_matrix)*100:.2f}%")
print(f"Sharpe Ratio: {portfolio_sharpe(equal_weights, annual_returns, cov_matrix):.4f}")

5.1.2 The Global Minimum Variance Portfolio

# Objective functions for optimization
def neg_sharpe(weights, returns, cov_matrix):
    """Negative Sharpe ratio (for minimization)."""
    return -portfolio_sharpe(weights, returns, cov_matrix)

def port_variance(weights, cov_matrix):
    """Portfolio variance (for minimization)."""
    return np.dot(weights.T, np.dot(cov_matrix, weights))

# Constraints: weights sum to 1
constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
initial_weights = np.array([1/n_assets] * n_assets)

# Optimize for minimum variance (unconstrained - allows short selling)
result_minvar = minimize(
    port_variance,
    initial_weights,
    args=(cov_matrix,),
    method='SLSQP',
    constraints=constraints
)

minvar_weights = result_minvar['x']

print("Global Minimum Variance Portfolio (Unconstrained)")
print("=" * 55)
print(f"\nOptimal Weights:")
for ticker, weight in zip(tickers, minvar_weights):
    print(f"  {ticker}: {weight*100:+.2f}%")

print(f"\nPortfolio Statistics:")
print(f"  Return: {portfolio_return(minvar_weights, annual_returns)*100:.2f}%")
print(f"  Volatility: {portfolio_volatility(minvar_weights, cov_matrix)*100:.2f}%")

Exercise 5.1: Maximum Sharpe Portfolio (Guided)

Your Task: Find the portfolio that maximizes the Sharpe ratio using scipy.optimize.minimize.

Fill in the blanks:

Exercise
Click to reveal solution
def find_max_sharpe(returns: pd.Series, cov_matrix: pd.DataFrame) -> np.ndarray:
    """Find the maximum Sharpe ratio portfolio weights."""
    n = len(returns)
    initial = np.ones(n) / n

    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]

    result = minimize(
        neg_sharpe,
        initial,
        args=(returns, cov_matrix),
        method='SLSQP',
        constraints=constraints
    )

    return result['x'] if result['success'] else None

maxsharpe_weights = find_max_sharpe(annual_returns, cov_matrix)
print(f"Max Sharpe: {portfolio_sharpe(maxsharpe_weights, annual_returns, cov_matrix):.4f}")
print(f"Return: {portfolio_return(maxsharpe_weights, annual_returns)*100:.2f}%")
print(f"Volatility: {portfolio_volatility(maxsharpe_weights, cov_matrix)*100:.2f}%")

Section 5.2: Solving with Scipy

The unconstrained optimization may give extreme positions. In practice, most portfolios can't take short positions.

In this section, you will learn: - How to use scipy.optimize.minimize effectively - Adding bounds for long-only constraints - Building a reusable optimizer class

5.2.1 Long-Only Constraint

# Long-only constraint: 0 <= w_i <= 1
bounds = tuple((0, 1) for _ in range(n_assets))

# Minimum variance with long-only constraint
result_minvar_long = minimize(
    port_variance,
    initial_weights,
    args=(cov_matrix,),
    method='SLSQP',
    bounds=bounds,
    constraints=constraints
)

minvar_long_weights = result_minvar_long['x']

print("Minimum Variance Portfolio (Long-Only)")
print("=" * 55)
for ticker, weight in zip(tickers, minvar_long_weights):
    if weight > 0.001:
        print(f"  {ticker}: {weight*100:.2f}%")

print(f"\nPortfolio Statistics:")
print(f"  Return: {portfolio_return(minvar_long_weights, annual_returns)*100:.2f}%")
print(f"  Volatility: {portfolio_volatility(minvar_long_weights, cov_matrix)*100:.2f}%")
# Compare unconstrained vs long-only
print("Impact of Long-Only Constraint")
print("=" * 50)
print(f"Unconstrained volatility: {portfolio_volatility(minvar_weights, cov_matrix)*100:.2f}%")
print(f"Long-only volatility:     {portfolio_volatility(minvar_long_weights, cov_matrix)*100:.2f}%")
cost = portfolio_volatility(minvar_long_weights, cov_matrix) - portfolio_volatility(minvar_weights, cov_matrix)
print(f"\nCost of constraint: +{cost*100:.2f}% volatility")

5.2.2 Reusable Portfolio Optimizer Class

class PortfolioOptimizer:
    """Mean-variance portfolio optimization."""
    
    def __init__(self, returns: pd.Series, cov_matrix: pd.DataFrame, rf: float = 0):
        self.returns = returns
        self.cov_matrix = cov_matrix
        self.rf = rf
        self.n_assets = len(returns)
        self.tickers = list(returns.index)
    
    def _port_return(self, weights):
        return np.dot(weights, self.returns)
    
    def _port_volatility(self, weights):
        return np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
    
    def _port_sharpe(self, weights):
        return (self._port_return(weights) - self.rf) / self._port_volatility(weights)
    
    def minimize_volatility(self, target_return=None, long_only=True):
        """Find minimum volatility portfolio."""
        constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
        
        if target_return is not None:
            constraints.append({
                'type': 'eq',
                'fun': lambda w: self._port_return(w) - target_return
            })
        
        bounds = tuple((0, 1) for _ in range(self.n_assets)) if long_only else None
        
        result = minimize(
            lambda w: self._port_volatility(w),
            np.ones(self.n_assets) / self.n_assets,
            method='SLSQP',
            bounds=bounds,
            constraints=constraints
        )
        
        return result['x'] if result['success'] else None
    
    def maximize_sharpe(self, long_only=True):
        """Find maximum Sharpe ratio portfolio."""
        constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
        bounds = tuple((0, 1) for _ in range(self.n_assets)) if long_only else None
        
        result = minimize(
            lambda w: -self._port_sharpe(w),
            np.ones(self.n_assets) / self.n_assets,
            method='SLSQP',
            bounds=bounds,
            constraints=constraints
        )
        
        return result['x'] if result['success'] else None
    
    def get_stats(self, weights):
        """Get portfolio statistics."""
        return {
            'return': self._port_return(weights),
            'volatility': self._port_volatility(weights),
            'sharpe': self._port_sharpe(weights),
            'weights': dict(zip(self.tickers, weights))
        }

print("PortfolioOptimizer class defined")
# Test the optimizer
optimizer = PortfolioOptimizer(annual_returns, cov_matrix)

min_vol_w = optimizer.minimize_volatility()
max_sharpe_w = optimizer.maximize_sharpe()

print("Portfolio Optimizer Results")
print("=" * 60)

print("\nMinimum Volatility Portfolio:")
stats = optimizer.get_stats(min_vol_w)
print(f"  Return: {stats['return']*100:.2f}%")
print(f"  Volatility: {stats['volatility']*100:.2f}%")
print(f"  Sharpe: {stats['sharpe']:.4f}")

print("\nMaximum Sharpe Portfolio:")
stats = optimizer.get_stats(max_sharpe_w)
print(f"  Return: {stats['return']*100:.2f}%")
print(f"  Volatility: {stats['volatility']*100:.2f}%")
print(f"  Sharpe: {stats['sharpe']:.4f}")

Exercise 5.2: Add Efficient Frontier Method (Guided)

Your Task: Add a method to the optimizer class that generates efficient frontier points.

Fill in the blanks:

Exercise
Click to reveal solution
def efficient_frontier(optimizer, n_points: int = 50) -> pd.DataFrame:
    """Generate efficient frontier points."""
    min_ret = optimizer.returns.min()
    max_ret = optimizer.returns.max()

    target_returns = np.linspace(min_ret, max_ret, n_points)

    frontier_vols = []
    frontier_rets = []

    for target in target_returns:
        weights = optimizer.minimize_volatility(target_return=target, long_only=True)

        if weights is not None:
            frontier_rets.append(optimizer._port_return(weights))
            frontier_vols.append(optimizer._port_volatility(weights))

    return pd.DataFrame({'return': frontier_rets, 'volatility': frontier_vols})

frontier = efficient_frontier(optimizer)
print(frontier.head())

# Plot
plt.figure(figsize=(10, 6))
plt.plot(frontier['volatility'], frontier['return'], 'b-', linewidth=2)
plt.xlabel('Volatility')
plt.ylabel('Return')
plt.title('Efficient Frontier')
plt.grid(True, alpha=0.3)
plt.show()

Section 5.3: Portfolio Constraints

Real-world portfolios have many constraints beyond "no short selling".

In this section, you will learn: - Position limits (max/min weights) - Sector constraints - Combining multiple constraint types

5.3.1 Position Limits

# Max 25% in any single asset
max_weight = 0.25
bounds_constrained = tuple((0, max_weight) for _ in range(n_assets))

result_constrained = minimize(
    lambda w: -portfolio_sharpe(w, annual_returns, cov_matrix),
    initial_weights,
    method='SLSQP',
    bounds=bounds_constrained,
    constraints=constraints
)

constrained_weights = result_constrained['x']

print(f"Maximum Sharpe Portfolio (Max {max_weight*100:.0f}% per asset)")
print("=" * 55)
for ticker, weight in zip(tickers, constrained_weights):
    if weight > 0.001:
        marker = " <- AT LIMIT" if abs(weight - max_weight) < 0.001 else ""
        print(f"  {ticker}: {weight*100:.2f}%{marker}")

print(f"\nSharpe: {portfolio_sharpe(constrained_weights, annual_returns, cov_matrix):.4f}")

5.3.2 Sector Constraints

# Define sector mappings
sectors = {
    'AAPL': 'Technology', 'MSFT': 'Technology', 
    'GOOGL': 'Technology', 'AMZN': 'Technology',
    'JNJ': 'Healthcare', 'JPM': 'Financial',
    'XOM': 'Energy', 'GLD': 'Commodities'
}

# Get tech stock indices
tech_idx = [i for i, t in enumerate(tickers) if sectors[t] == 'Technology']
print(f"Technology stocks: {[tickers[i] for i in tech_idx]}")
# Constraint: Tech sector <= 50%
max_tech = 0.50

sector_constraints = [
    {'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
    {'type': 'ineq', 'fun': lambda w: max_tech - sum(w[i] for i in tech_idx)}
]

result_sector = minimize(
    lambda w: -portfolio_sharpe(w, annual_returns, cov_matrix),
    initial_weights,
    method='SLSQP',
    bounds=bounds,  # long-only
    constraints=sector_constraints
)

sector_weights = result_sector['x']

# Calculate sector exposures
sector_exposure = {}
for ticker, weight in zip(tickers, sector_weights):
    sector = sectors[ticker]
    sector_exposure[sector] = sector_exposure.get(sector, 0) + weight

print(f"Maximum Sharpe Portfolio (Tech <= {max_tech*100:.0f}%)")
print("=" * 55)
print("\nSector Exposure:")
for sector, exposure in sorted(sector_exposure.items(), key=lambda x: -x[1]):
    marker = " <- AT LIMIT" if sector == 'Technology' and abs(exposure - max_tech) < 0.01 else ""
    print(f"  {sector}: {exposure*100:.2f}%{marker}")

Exercise 5.3: Custom Constraints (Open-ended)

Your Task:

Build a function that: - Takes min_weight and max_weight parameters - Ensures every asset has at least min_weight allocation - Maximizes Sharpe ratio within these constraints - Returns the optimal weights and portfolio statistics

Your implementation:

Exercise
Click to reveal solution
def optimize_with_bounds(returns: pd.Series, cov_matrix: pd.DataFrame,
                         min_weight: float = 0.05, 
                         max_weight: float = 0.25) -> dict:
    """
    Optimize portfolio with minimum and maximum weight constraints.

    Args:
        returns: Expected returns
        cov_matrix: Covariance matrix
        min_weight: Minimum weight per asset
        max_weight: Maximum weight per asset

    Returns:
        Dictionary with weights and statistics
    """
    n = len(returns)
    tickers = list(returns.index)

    # Bounds with min and max
    bounds = tuple((min_weight, max_weight) for _ in range(n))

    # Constraint: weights sum to 1
    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]

    # Objective: maximize Sharpe (minimize negative Sharpe)
    def neg_sharpe(w):
        ret = np.dot(w, returns)
        vol = np.sqrt(np.dot(w.T, np.dot(cov_matrix, w)))
        return -ret / vol

    result = minimize(
        neg_sharpe,
        np.ones(n) / n,
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )

    if result['success']:
        weights = result['x']
        ret = np.dot(weights, returns)
        vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))

        return {
            'weights': dict(zip(tickers, weights)),
            'return': ret,
            'volatility': vol,
            'sharpe': ret / vol
        }
    return None

# Test
result = optimize_with_bounds(annual_returns, cov_matrix, 0.05, 0.20)
print(f"Constrained Portfolio (5% <= w <= 20%)")
print(f"Return: {result['return']*100:.2f}%")
print(f"Volatility: {result['volatility']*100:.2f}%")
print(f"Sharpe: {result['sharpe']:.4f}")
print(f"\nWeights:")
for ticker, w in result['weights'].items():
    print(f"  {ticker}: {w*100:.2f}%")

Section 5.4: Target Return & Risk

Sometimes we want a portfolio that meets specific objectives like a target return or risk budget.

In this section, you will learn: - Optimizing for a target return - Optimizing for a target volatility - Visualizing optimization results

5.4.1 Optimize for Target Return

def optimize_for_target_return(target_return: float, returns: pd.Series, 
                               cov_matrix: pd.DataFrame, long_only: bool = True):
    """Find minimum volatility portfolio for a target return."""
    n = len(returns)
    
    constraints = [
        {'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
        {'type': 'eq', 'fun': lambda w: np.dot(w, returns) - target_return}
    ]
    
    bounds = tuple((0, 1) for _ in range(n)) if long_only else None
    
    result = minimize(
        lambda w: np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))),
        np.ones(n) / n,
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )
    
    return result['x'] if result['success'] else None

# Test with different targets
target_returns = [0.08, 0.12, 0.16, 0.20]

print("Portfolios for Different Target Returns")
print("=" * 55)
for target in target_returns:
    weights = optimize_for_target_return(target, annual_returns, cov_matrix)
    if weights is not None:
        vol = portfolio_volatility(weights, cov_matrix)
        print(f"Target {target*100:.0f}%: Volatility = {vol*100:.2f}%")
    else:
        print(f"Target {target*100:.0f}%: Not achievable")

5.4.2 Optimize for Target Volatility

def optimize_for_target_volatility(target_vol: float, returns: pd.Series,
                                   cov_matrix: pd.DataFrame, long_only: bool = True):
    """Find maximum return portfolio for a target volatility."""
    n = len(returns)
    
    constraints = [
        {'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
        {'type': 'eq', 'fun': lambda w: np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))) - target_vol}
    ]
    
    bounds = tuple((0, 1) for _ in range(n)) if long_only else None
    
    result = minimize(
        lambda w: -np.dot(w, returns),
        np.ones(n) / n,
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )
    
    return result['x'] if result['success'] else None

# Test with different targets
target_vols = [0.12, 0.16, 0.20, 0.25]

print("Portfolios for Different Target Volatilities")
print("=" * 55)
for target in target_vols:
    weights = optimize_for_target_volatility(target, annual_returns, cov_matrix)
    if weights is not None:
        ret = portfolio_return(weights, annual_returns)
        print(f"Target Vol {target*100:.0f}%: Return = {ret*100:.2f}%")
    else:
        print(f"Target Vol {target*100:.0f}%: Not achievable")

Exercise 5.4: Efficient Frontier Visualization (Guided)

Your Task: Generate and plot the efficient frontier with random portfolios for context.

Fill in the blanks:

Exercise
Click to reveal solution
def plot_efficient_frontier(returns: pd.Series, cov_matrix: pd.DataFrame,
                           n_frontier: int = 30, n_random: int = 3000):
    """Plot efficient frontier with random portfolios."""
    n = len(returns)
    tickers = list(returns.index)

    np.random.seed(42)
    random_rets = []
    random_vols = []

    for _ in range(n_random):
        w = np.random.random(n)
        w = w / w.sum()
        random_rets.append(np.dot(w, returns))
        random_vols.append(np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))))

    min_ret = returns.min()
    max_ret = returns.max()
    targets = np.linspace(min_ret, max_ret, n_frontier)

    frontier_rets = []
    frontier_vols = []

    for target in targets:
        w = optimize_for_target_return(target, returns, cov_matrix)
        if w is not None:
            frontier_rets.append(np.dot(w, returns))
            frontier_vols.append(np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))))

    plt.figure(figsize=(12, 8))
    plt.scatter(random_vols, random_rets, alpha=0.3, s=5, c='gray', label='Random Portfolios')
    plt.plot(frontier_vols, frontier_rets, 'b-', linewidth=2, label='Efficient Frontier')

    for ticker in tickers:
        vol = np.sqrt(cov_matrix.loc[ticker, ticker])
        ret = returns[ticker]
        plt.scatter(vol, ret, s=100, marker='*', zorder=5)
        plt.annotate(ticker, (vol, ret), xytext=(5, 5), textcoords='offset points')

    plt.xlabel('Volatility')
    plt.ylabel('Expected Return')
    plt.title('Efficient Frontier')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

plot_efficient_frontier(annual_returns, cov_matrix)

Exercise 5.5: Portfolio Optimizer with Multiple Strategies (Open-ended)

Your Task:

Build a function that: - Compares multiple optimization strategies (min vol, max sharpe, equal weight) - Calculates statistics for each - Returns a comparison DataFrame

Your implementation:

Exercise
Click to reveal solution
def compare_strategies(returns: pd.Series, cov_matrix: pd.DataFrame) -> pd.DataFrame:
    """
    Compare multiple portfolio optimization strategies.

    Args:
        returns: Expected returns
        cov_matrix: Covariance matrix

    Returns:
        DataFrame comparing strategies
    """
    n = len(returns)
    strategies = {}

    # Equal weight
    equal_w = np.ones(n) / n
    strategies['Equal Weight'] = equal_w

    # Minimum volatility
    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
    bounds = tuple((0, 1) for _ in range(n))

    result = minimize(
        lambda w: np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))),
        np.ones(n) / n,
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )
    if result['success']:
        strategies['Min Volatility'] = result['x']

    # Maximum Sharpe
    result = minimize(
        lambda w: -np.dot(w, returns) / np.sqrt(np.dot(w.T, np.dot(cov_matrix, w))),
        np.ones(n) / n,
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )
    if result['success']:
        strategies['Max Sharpe'] = result['x']

    # Calculate stats
    results = []
    for name, weights in strategies.items():
        ret = np.dot(weights, returns)
        vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
        results.append({
            'Strategy': name,
            'Return': ret,
            'Volatility': vol,
            'Sharpe': ret / vol
        })

    return pd.DataFrame(results).set_index('Strategy')

# Test
comparison = compare_strategies(annual_returns, cov_matrix)
print(comparison.round(4))

Exercise 5.6: Complete Optimization Framework (Open-ended)

Your Task:

Build a PortfolioOptimizerPro class that: - Supports multiple optimization objectives - Handles various constraint types (bounds, sectors) - Includes efficient frontier generation - Provides visualization methods

Your implementation:

Exercise
Click to reveal solution
class PortfolioOptimizerPro:
    """Professional portfolio optimization framework."""

    def __init__(self, returns: pd.Series, cov_matrix: pd.DataFrame,
                 tickers: list = None, rf: float = 0):
        self.returns = returns
        self.cov_matrix = cov_matrix
        self.tickers = tickers or list(returns.index)
        self.rf = rf
        self.n = len(returns)
        self.results = {}

    def _calc_stats(self, weights):
        ret = np.dot(weights, self.returns)
        vol = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
        return {'return': ret, 'volatility': vol, 'sharpe': (ret - self.rf) / vol}

    def optimize(self, objective: str = 'max_sharpe', 
                 min_weight: float = 0, max_weight: float = 1,
                 sector_limits: dict = None) -> np.ndarray:
        """Run optimization with specified objective and constraints."""
        bounds = tuple((min_weight, max_weight) for _ in range(self.n))
        constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]

        if objective == 'max_sharpe':
            obj_func = lambda w: -np.dot(w, self.returns) / np.sqrt(np.dot(w.T, np.dot(self.cov_matrix, w)))
        elif objective == 'min_vol':
            obj_func = lambda w: np.sqrt(np.dot(w.T, np.dot(self.cov_matrix, w)))
        else:
            raise ValueError(f"Unknown objective: {objective}")

        result = minimize(
            obj_func,
            np.ones(self.n) / self.n,
            method='SLSQP',
            bounds=bounds,
            constraints=constraints
        )

        if result['success']:
            self.results[objective] = {
                'weights': result['x'],
                'stats': self._calc_stats(result['x'])
            }
            return result['x']
        return None

    def efficient_frontier(self, n_points: int = 30) -> pd.DataFrame:
        """Generate efficient frontier."""
        targets = np.linspace(self.returns.min(), self.returns.max(), n_points)
        frontier = []

        for target in targets:
            constraints = [
                {'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
                {'type': 'eq', 'fun': lambda w, t=target: np.dot(w, self.returns) - t}
            ]
            result = minimize(
                lambda w: np.sqrt(np.dot(w.T, np.dot(self.cov_matrix, w))),
                np.ones(self.n) / self.n,
                method='SLSQP',
                bounds=tuple((0, 1) for _ in range(self.n)),
                constraints=constraints
            )
            if result['success']:
                frontier.append({
                    'return': np.dot(result['x'], self.returns),
                    'volatility': np.sqrt(np.dot(result['x'].T, np.dot(self.cov_matrix, result['x'])))
                })

        return pd.DataFrame(frontier)

    def plot(self):
        """Plot results."""
        frontier = self.efficient_frontier()

        plt.figure(figsize=(12, 8))
        plt.plot(frontier['volatility'], frontier['return'], 'b-', linewidth=2, label='Efficient Frontier')

        for name, data in self.results.items():
            stats = data['stats']
            plt.scatter(stats['volatility'], stats['return'], s=150, marker='D', 
                       label=f"{name} (SR={stats['sharpe']:.2f})", zorder=5)

        plt.xlabel('Volatility')
        plt.ylabel('Return')
        plt.title('Portfolio Optimization Results')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.show()

# Test
opt_pro = PortfolioOptimizerPro(annual_returns, cov_matrix, tickers)
opt_pro.optimize('max_sharpe')
opt_pro.optimize('min_vol')
opt_pro.plot()

Module Project: Complete Portfolio Optimization Workflow

Build a complete portfolio optimization workflow comparing multiple strategies.

Your Challenge:

  1. Run multiple optimization strategies
  2. Compare portfolio statistics
  3. Visualize weight allocations
  4. Generate the efficient frontier
  5. Summarize findings
# YOUR CODE HERE - Module Project
Click to reveal solution
# Complete Portfolio Optimization Project

# Step 1: Run multiple strategies
optimizer = PortfolioOptimizer(annual_returns, cov_matrix)

strategies = {
    'Equal Weight': np.ones(n_assets) / n_assets,
    'Min Volatility': optimizer.minimize_volatility(),
    'Max Sharpe': optimizer.maximize_sharpe()
}

# Step 2: Compare statistics
print("Strategy Comparison")
print("=" * 70)
comparison_data = []
for name, weights in strategies.items():
    stats = optimizer.get_stats(weights)
    comparison_data.append({
        'Strategy': name,
        'Return': stats['return'],
        'Volatility': stats['volatility'],
        'Sharpe': stats['sharpe']
    })

comparison_df = pd.DataFrame(comparison_data).set_index('Strategy')
print(comparison_df.round(4))

# Step 3: Visualize allocations
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, (name, weights) in zip(axes, strategies.items()):
    bars = ax.barh(tickers, weights * 100)
    ax.set_xlabel('Weight (%)')
    ax.set_title(name)
    ax.axvline(x=0, color='black', linewidth=0.5)
plt.tight_layout()
plt.show()

# Step 4: Summary
best_sharpe = comparison_df['Sharpe'].idxmax()
lowest_vol = comparison_df['Volatility'].idxmin()

print(f"\nKey Findings:")
print(f"  Best Risk-Adjusted: {best_sharpe} (Sharpe = {comparison_df.loc[best_sharpe, 'Sharpe']:.4f})")
print(f"  Lowest Risk: {lowest_vol} (Vol = {comparison_df.loc[lowest_vol, 'Volatility']*100:.2f}%)")

Key Takeaways

What You Learned

1. Optimization Fundamentals

  • Minimize volatility or maximize Sharpe ratio
  • Use scipy.optimize.minimize with constraints

2. Constraints

  • Long-only: bounds = tuple((0, 1) for _ in range(n))
  • Sum to 1: {'type': 'eq', 'fun': lambda w: np.sum(w) - 1}
  • Position limits: Custom bounds
  • Sector limits: Inequality constraints

3. Target Optimization

  • Target return: Add return equality constraint
  • Target volatility: Add volatility equality constraint

4. Trade-offs

  • More constraints = Lower optimal performance
  • But more realistic, implementable portfolios

Key Code Patterns

# Basic optimization
result = minimize(
    objective_function,
    initial_weights,
    method='SLSQP',
    bounds=bounds,
    constraints=constraints
)

Coming Up Next

In Module 6: Advanced Portfolio Techniques, we'll explore Risk Parity, Black-Litterman, and Hierarchical Risk Parity.


Congratulations on completing Module 5!

Module 6: Advanced Portfolio Techniques

Course 3: Quantitative Finance
Part 2: Portfolio Theory


Learning Objectives

By the end of this module, you will be able to:

  1. Implement Risk Parity portfolios for equal risk contribution
  2. Apply the Black-Litterman model to incorporate views
  3. Build robust portfolios that reduce estimation error
  4. Use Hierarchical Risk Parity (HRP) for ML-based allocation
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 5: Mean-Variance Optimization

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy.optimize import minimize
from scipy.cluster.hierarchy import linkage, dendrogram
from scipy.spatial.distance import squareform
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Data

# Download multi-asset data
tickers = ['SPY', 'TLT', 'GLD', 'VNQ', 'EFA', 'EEM', 'IEF', 'DBC']
# SPY: US Equities, TLT: Long Bonds, GLD: Gold, VNQ: REITs
# EFA: Developed Intl, EEM: Emerging, IEF: Intermediate Bonds, DBC: Commodities

end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)

print("Downloading multi-asset data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False)

# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.xs('Close', axis=1, level=1)
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()

# Calculate statistics
annual_returns = returns.mean() * 252
annual_volatility = returns.std() * np.sqrt(252)
cov_matrix = returns.cov() * 252
corr_matrix = returns.corr()
n_assets = len(tickers)

print(f"\nData loaded: {len(prices)} trading days")
print(f"Assets: {list(prices.columns)}")

Section 6.1: Risk Parity

A 60/40 stock/bond portfolio sounds balanced, but stocks contribute ~90% of the portfolio's risk!

In this section, you will learn: - How to calculate risk contribution by asset - Building portfolios where each asset contributes equal risk - Risk budgeting with custom allocations

6.1.1 Risk Contribution

The risk contribution of asset $i$ is:

$$RC_i = w_i \cdot \frac{\partial \sigma_p}{\partial w_i} = w_i \cdot \frac{(\Sigma w)_i}{\sigma_p}$$

For a risk parity portfolio: $$RC_i = RC_j \quad \forall i,j$$

# Core risk functions
def portfolio_volatility(weights: np.ndarray, cov_matrix: pd.DataFrame) -> float:
    """Calculate portfolio volatility."""
    return np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))

def risk_contribution(weights: np.ndarray, cov_matrix: pd.DataFrame) -> np.ndarray:
    """Calculate risk contribution of each asset."""
    port_vol = portfolio_volatility(weights, cov_matrix)
    marginal_contrib = np.dot(cov_matrix, weights) / port_vol
    return weights * marginal_contrib

def risk_contribution_pct(weights: np.ndarray, cov_matrix: pd.DataFrame) -> np.ndarray:
    """Risk contribution as percentage of total."""
    rc = risk_contribution(weights, cov_matrix)
    return rc / rc.sum() * 100

print("Risk functions defined")
# Equal Weight Portfolio - Risk Contribution
equal_weights = np.array([1/n_assets] * n_assets)

print("Equal Weight Portfolio - Risk Contribution")
print("=" * 50)
rc_equal = risk_contribution_pct(equal_weights, cov_matrix)
for ticker, rc in zip(tickers, rc_equal):
    print(f"  {ticker}: {rc:.2f}%")

print(f"\nNotice: Even with equal weights, risk contribution varies widely!")

6.1.2 Risk Parity Optimization

def risk_parity_objective(weights: np.ndarray, cov_matrix: pd.DataFrame) -> float:
    """Objective: minimize deviation from equal risk contribution."""
    rc = risk_contribution(weights, cov_matrix)
    target = np.ones(len(weights)) / len(weights) * rc.sum()
    return np.sum((rc - target) ** 2)

def optimize_risk_parity(cov_matrix: pd.DataFrame) -> np.ndarray:
    """Find risk parity weights."""
    n = cov_matrix.shape[0]
    x0 = np.ones(n) / n
    
    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
    bounds = tuple((0.01, 1) for _ in range(n))
    
    result = minimize(
        risk_parity_objective,
        x0,
        args=(cov_matrix,),
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )
    
    return result['x'] if result['success'] else None
# Find Risk Parity portfolio
rp_weights = optimize_risk_parity(cov_matrix)

print("Risk Parity Portfolio")
print("=" * 55)
print(f"\n{'Asset':<8} {'Weight':>10} {'Risk Contrib':>15}")
print("-" * 40)

rc_rp = risk_contribution_pct(rp_weights, cov_matrix)
for ticker, w, rc in zip(tickers, rp_weights, rc_rp):
    print(f"{ticker:<8} {w*100:>9.2f}% {rc:>14.2f}%")

print(f"\nPortfolio Volatility: {portfolio_volatility(rp_weights, cov_matrix)*100:.2f}%")

Exercise 6.1: Risk Budgeting (Guided)

Your Task: Implement risk budgeting where you can specify custom risk allocations per asset.

Fill in the blanks:

Exercise
Click to reveal solution
def risk_budgeting_objective(weights: np.ndarray, cov_matrix: pd.DataFrame,
                            risk_budget: np.ndarray) -> float:
    """Objective: match specified risk budget."""
    rc = risk_contribution(weights, cov_matrix)
    rc_pct = rc / rc.sum()
    return np.sum((rc_pct - risk_budget) ** 2)

def optimize_risk_budget(cov_matrix: pd.DataFrame, 
                         risk_budget: np.ndarray) -> np.ndarray:
    """Find weights for a given risk budget."""
    n = cov_matrix.shape[0]
    x0 = np.ones(n) / n

    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
    bounds = tuple((0.01, 1) for _ in range(n))

    result = minimize(
        risk_budgeting_objective,
        x0,
        args=(cov_matrix, risk_budget),
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )

    return result['x'] if result['success'] else None

# Test: 60% equity risk (SPY, EFA, EEM), 40% defensive
risk_budget = np.array([0.20, 0.10, 0.10, 0.10, 0.20, 0.20, 0.05, 0.05])
rb_weights = optimize_risk_budget(cov_matrix, risk_budget)

print("Risk Budgeting Portfolio")
rc_rb = risk_contribution_pct(rb_weights, cov_matrix)
for ticker, w, rc, target in zip(tickers, rb_weights, rc_rb, risk_budget*100):
    print(f"{ticker}: Weight={w*100:.2f}%, RC={rc:.2f}%, Target={target:.2f}%")

Section 6.2: Black-Litterman Model

MVO requires return estimates that are unreliable. Black-Litterman combines market equilibrium with investor views.

In this section, you will learn: - Deriving equilibrium returns from market weights - Expressing and incorporating views - Computing posterior expected returns

6.2.1 Equilibrium Returns

Equilibrium returns (implied by market): $$\Pi = \delta \Sigma w_{mkt}$$

Where: - $\delta$ = risk aversion coefficient (typically 2-4) - $\Sigma$ = covariance matrix - $w_{mkt}$ = market capitalization weights

# Calculate equilibrium returns
market_weights = np.array([1/n_assets] * n_assets)  # Simplified
delta = 2.5  # Risk aversion

equilibrium_returns = delta * np.dot(cov_matrix, market_weights)

print("Equilibrium Returns (Implied by Market)")
print("=" * 45)
for ticker, ret in zip(tickers, equilibrium_returns):
    print(f"  {ticker}: {ret*100:.2f}%")

6.2.2 Black-Litterman Formula

Posterior expected returns: $$E[R] = [(\tau\Sigma)^{-1} + P'\Omega^{-1}P]^{-1} [(\tau\Sigma)^{-1}\Pi + P'\Omega^{-1}Q]$$

def black_litterman(cov_matrix: np.ndarray, market_weights: np.ndarray,
                    P: np.ndarray, Q: np.ndarray, omega: np.ndarray,
                    tau: float = 0.05, delta: float = 2.5) -> tuple:
    """
    Black-Litterman model.
    
    Args:
        cov_matrix: Asset covariance matrix
        market_weights: Market cap weights
        P: View matrix (k x n)
        Q: View vector (k x 1)
        omega: View uncertainty matrix (k x k)
        tau: Scalar (0.025-0.05 typical)
        delta: Risk aversion coefficient
        
    Returns:
        Tuple of (posterior_returns, posterior_cov)
    """
    # Equilibrium returns
    pi = delta * np.dot(cov_matrix, market_weights)
    
    # Scaled covariance
    tau_sigma = tau * cov_matrix
    tau_sigma_inv = np.linalg.inv(tau_sigma)
    omega_inv = np.linalg.inv(omega)
    
    # Posterior precision and covariance
    posterior_precision = tau_sigma_inv + np.dot(P.T, np.dot(omega_inv, P))
    posterior_cov = np.linalg.inv(posterior_precision)
    
    # Posterior mean
    posterior_returns = np.dot(posterior_cov, 
                               np.dot(tau_sigma_inv, pi) + np.dot(P.T, np.dot(omega_inv, Q)))
    
    return posterior_returns, posterior_cov

print("Black-Litterman function defined")
# Example views:
# View 1: SPY will return 8% (absolute)
# View 2: EEM will outperform EFA by 2% (relative)

# P matrix: rows=views, columns=assets
# SPY=0, TLT=1, GLD=2, VNQ=3, EFA=4, EEM=5, IEF=6, DBC=7
P = np.array([
    [1, 0, 0, 0, 0, 0, 0, 0],      # SPY
    [0, 0, 0, 0, -1, 1, 0, 0]      # EEM - EFA
])

Q = np.array([0.08, 0.02])  # View returns

# Omega: view uncertainty (diagonal)
omega = np.diag([0.001, 0.002])

print("Investor Views:")
print("  View 1: SPY returns 8% (high confidence)")
print("  View 2: EEM outperforms EFA by 2% (moderate confidence)")
# Apply Black-Litterman
bl_returns, bl_cov = black_litterman(
    cov_matrix.values, market_weights, P, Q, omega
)

print("Black-Litterman Expected Returns")
print("=" * 55)
print(f"\n{'Asset':<8} {'Equilibrium':>12} {'BL Return':>12} {'Change':>10}")
print("-" * 50)

for i, ticker in enumerate(tickers):
    eq_ret = equilibrium_returns[i]
    bl_ret = bl_returns[i]
    change = bl_ret - eq_ret
    print(f"{ticker:<8} {eq_ret*100:>11.2f}% {bl_ret*100:>11.2f}% {change*100:>+9.2f}%")

Exercise 6.2: Black-Litterman Optimizer (Guided)

Your Task: Build a complete Black-Litterman workflow with optimization.

Fill in the blanks:

Exercise
Click to reveal solution
def bl_optimize(returns: np.ndarray, cov_matrix: np.ndarray,
                long_only: bool = True) -> np.ndarray:
    """Optimize portfolio using Black-Litterman returns."""
    n = len(returns)

    def neg_sharpe(w):
        ret = np.dot(w, returns)
        vol = np.sqrt(np.dot(w.T, np.dot(cov_matrix, w)))
        return -ret / vol

    constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
    bounds = tuple((0, 1) for _ in range(n)) if long_only else None

    result = minimize(
        neg_sharpe,
        np.ones(n) / n,
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )

    return result['x'] if result['success'] else None

bl_weights = bl_optimize(bl_returns, bl_cov)
print("Black-Litterman Optimized Portfolio")
print("=" * 45)
for ticker, w in zip(tickers, bl_weights):
    print(f"  {ticker}: {w*100:.2f}%")

Exercise 6.3: Custom Views Implementation (Open-ended)

Your Task:

Build a function that: - Takes a list of view specifications (dictionaries) - Constructs the P matrix and Q vector automatically - Allows specifying confidence levels - Returns Black-Litterman expected returns

Your implementation:

Exercise
Click to reveal solution
def build_views(tickers: list, views: list) -> tuple:
    """
    Build P, Q, omega matrices from view specifications.

    Args:
        tickers: List of ticker symbols
        views: List of dicts with keys:
               - 'asset': ticker for absolute view, OR
               - 'long': ticker to go long, 'short': ticker to short
               - 'return': expected return
               - 'confidence': 'high', 'medium', 'low'

    Returns:
        Tuple of (P, Q, omega)
    """
    n_assets = len(tickers)
    n_views = len(views)
    ticker_idx = {t: i for i, t in enumerate(tickers)}

    confidence_map = {'high': 0.001, 'medium': 0.002, 'low': 0.005}

    P = np.zeros((n_views, n_assets))
    Q = np.zeros(n_views)
    omega_diag = np.zeros(n_views)

    for i, view in enumerate(views):
        Q[i] = view['return']
        omega_diag[i] = confidence_map.get(view.get('confidence', 'medium'), 0.002)

        if 'asset' in view:  # Absolute view
            P[i, ticker_idx[view['asset']]] = 1
        else:  # Relative view
            P[i, ticker_idx[view['long']]] = 1
            P[i, ticker_idx[view['short']]] = -1

    return P, Q, np.diag(omega_diag)

# Test
views = [
    {'asset': 'SPY', 'return': 0.10, 'confidence': 'high'},
    {'long': 'EEM', 'short': 'EFA', 'return': 0.03, 'confidence': 'medium'},
    {'asset': 'GLD', 'return': 0.05, 'confidence': 'low'}
]

P, Q, omega = build_views(tickers, views)
bl_ret, bl_cov = black_litterman(cov_matrix.values, market_weights, P, Q, omega)

print("BL Returns with Custom Views:")
for t, r in zip(tickers, bl_ret):
    print(f"  {t}: {r*100:.2f}%")

Section 6.3: Robust Optimization

MVO is highly sensitive to input estimates. Robust methods reduce this sensitivity.

In this section, you will learn: - Shrinkage estimators for covariance - Resampled efficiency - Evaluating weight stability

6.3.1 Shrinkage Estimators

def ledoit_wolf_shrinkage(returns: pd.DataFrame, shrinkage: float = 0.2) -> np.ndarray:
    """
    Simplified Ledoit-Wolf shrinkage.
    Shrinks sample covariance toward scaled identity.
    """
    sample_cov = returns.cov().values * 252
    p = sample_cov.shape[0]
    
    # Target: scaled identity
    mu = np.trace(sample_cov) / p
    target = mu * np.eye(p)
    
    # Shrunk covariance
    shrunk_cov = shrinkage * target + (1 - shrinkage) * sample_cov
    
    return shrunk_cov

shrunk_cov = ledoit_wolf_shrinkage(returns)
print(f"Shrinkage applied: Sample cov pulled 20% toward diagonal")

6.3.2 Resampled Efficiency

def resampled_optimization(returns: np.ndarray, cov_matrix: np.ndarray,
                           n_samples: int = 100) -> tuple:
    """
    Resampled efficient frontier.
    Bootstrap samples for more stable weights.
    """
    n_assets = len(returns)
    all_weights = []
    
    np.random.seed(42)
    
    for _ in range(n_samples):
        # Simulate returns from distribution
        sim_returns = np.random.multivariate_normal(
            returns, cov_matrix / 252, size=252
        )
        
        sim_mean = sim_returns.mean(axis=0) * 252
        sim_cov = np.cov(sim_returns.T) * 252
        
        # Optimize
        def neg_sharpe(w):
            return -np.dot(w, sim_mean) / np.sqrt(np.dot(w.T, np.dot(sim_cov, w)))
        
        result = minimize(
            neg_sharpe,
            np.ones(n_assets) / n_assets,
            method='SLSQP',
            bounds=tuple((0, 1) for _ in range(n_assets)),
            constraints=[{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
        )
        
        if result['success']:
            all_weights.append(result['x'])
    
    # Average weights
    avg_weights = np.mean(all_weights, axis=0)
    avg_weights = avg_weights / avg_weights.sum()
    
    return avg_weights, np.array(all_weights)

print("Running resampled optimization (100 samples)...")
resampled_weights, all_resampled = resampled_optimization(
    annual_returns.values, cov_matrix.values
)

print("\nResampled Efficient Portfolio")
for ticker, w in zip(tickers, resampled_weights):
    print(f"  {ticker}: {w*100:.2f}%")

Exercise 6.4: Weight Stability Analysis (Open-ended)

Your Task:

Build a function that: - Takes the resampled weight matrix - Calculates statistics for each asset (mean, std, confidence interval) - Visualizes the weight uncertainty with box plots - Returns a DataFrame with weight statistics

Your implementation:

Exercise
Click to reveal solution
def analyze_weight_stability(all_weights: np.ndarray, 
                             tickers: list) -> pd.DataFrame:
    """
    Analyze stability of resampled weights.

    Args:
        all_weights: Array of shape (n_samples, n_assets)
        tickers: List of ticker symbols

    Returns:
        DataFrame with weight statistics
    """
    stats = []

    for i, ticker in enumerate(tickers):
        weights_i = all_weights[:, i]
        stats.append({
            'Ticker': ticker,
            'Mean': weights_i.mean(),
            'Std': weights_i.std(),
            'Min': weights_i.min(),
            'Max': weights_i.max(),
            'CI_Lower': np.percentile(weights_i, 2.5),
            'CI_Upper': np.percentile(weights_i, 97.5)
        })

    df = pd.DataFrame(stats).set_index('Ticker')

    # Visualize
    fig, ax = plt.subplots(figsize=(12, 6))
    bp = ax.boxplot([all_weights[:, i] * 100 for i in range(len(tickers))],
                    positions=range(len(tickers)))
    ax.set_xticks(range(len(tickers)))
    ax.set_xticklabels(tickers)
    ax.set_ylabel('Weight (%)')
    ax.set_title('Weight Uncertainty from Resampling')
    ax.axhline(y=100/len(tickers), color='red', linestyle='--', alpha=0.5, label='Equal Weight')
    ax.legend()
    plt.tight_layout()
    plt.show()

    return df

# Test
stability_df = analyze_weight_stability(all_resampled, tickers)
print(stability_df.round(4))

Section 6.4: Hierarchical Risk Parity (HRP)

HRP is a modern ML-based approach that doesn't require return estimates.

In this section, you will learn: - Hierarchical clustering of assets - Quasi-diagonalization - Recursive bisection for allocation

6.4.1 HRP Algorithm

  1. Tree Clustering: Build hierarchy based on correlation distance
  2. Quasi-Diagonalization: Reorder assets to cluster similar ones
  3. Recursive Bisection: Split and allocate inversely to variance
def correlation_distance(corr_matrix: np.ndarray) -> np.ndarray:
    """Convert correlation to distance."""
    return np.sqrt(0.5 * (1 - corr_matrix))

def get_quasi_diag(link: np.ndarray) -> list:
    """Get quasi-diagonal order from linkage."""
    link = link.astype(int)
    sort_ix = pd.Series([link[-1, 0], link[-1, 1]])
    num_items = link[-1, 3]
    
    while sort_ix.max() >= num_items:
        sort_ix.index = range(0, sort_ix.shape[0] * 2, 2)
        df0 = sort_ix[sort_ix >= num_items]
        i = df0.index
        j = df0.values - num_items
        sort_ix[i] = link[j, 0]
        df0 = pd.Series(link[j, 1], index=i + 1)
        sort_ix = pd.concat([sort_ix, df0]).sort_index()
        sort_ix.index = range(sort_ix.shape[0])
    
    return sort_ix.tolist()

def get_cluster_var(cov: pd.DataFrame, cluster_items: list) -> float:
    """Calculate variance of a cluster."""
    cov_slice = cov.iloc[cluster_items, cluster_items]
    w = 1 / np.diag(cov_slice)
    w = w / w.sum()
    return np.dot(w, np.dot(cov_slice, w))

def hrp_allocation(cov: pd.DataFrame, sort_ix: list) -> pd.Series:
    """Recursive bisection allocation."""
    w = pd.Series(1.0, index=sort_ix)
    cluster_items = [sort_ix]
    
    while len(cluster_items) > 0:
        cluster_items = [i[j:k] for i in cluster_items 
                        for j, k in ((0, len(i) // 2), (len(i) // 2, len(i))) 
                        if len(i) > 1]
        
        for i in range(0, len(cluster_items), 2):
            c0 = cluster_items[i]
            c1 = cluster_items[i + 1]
            
            var0 = get_cluster_var(cov, c0)
            var1 = get_cluster_var(cov, c1)
            
            alpha = 1 - var0 / (var0 + var1)
            w[c0] *= alpha
            w[c1] *= 1 - alpha
    
    return w

print("HRP functions defined")
# Step 1: Calculate distance and cluster
dist_matrix = correlation_distance(corr_matrix.values)
dist_array = squareform(dist_matrix, checks=False)
link = linkage(dist_array, method='ward')

# Plot dendrogram
plt.figure(figsize=(12, 6))
dendrogram(link, labels=tickers, leaf_rotation=0)
plt.title('Hierarchical Clustering of Assets', fontsize=14, fontweight='bold')
plt.xlabel('Asset')
plt.ylabel('Distance')
plt.tight_layout()
plt.show()
# Step 2 & 3: Quasi-diagonal order and allocation
sort_ix = get_quasi_diag(link)
print(f"Quasi-diagonal ordering: {' -> '.join([tickers[i] for i in sort_ix])}")

cov_df = pd.DataFrame(cov_matrix.values, index=range(n_assets), columns=range(n_assets))
hrp_w = hrp_allocation(cov_df, sort_ix)

# Map to original order
hrp_weights = np.zeros(n_assets)
for i, idx in enumerate(sort_ix):
    hrp_weights[idx] = hrp_w.iloc[i]

print("\nHierarchical Risk Parity Weights:")
for ticker, w in zip(tickers, hrp_weights):
    print(f"  {ticker}: {w*100:.2f}%")

Exercise 6.5: Complete HRP Class (Guided)

Your Task: Build a complete HRP class encapsulating all steps.

Fill in the blanks:

Exercise
Click to reveal solution
class HierarchicalRiskParity:
    """Hierarchical Risk Parity portfolio allocation."""

    def __init__(self, returns: pd.DataFrame):
        self.returns = returns
        self.tickers = list(returns.columns)
        self.n_assets = len(self.tickers)
        self.cov_matrix = returns.cov() * 252
        self.corr_matrix = returns.corr()
        self.weights = None

    def fit(self):
        """Run HRP algorithm."""
        dist = correlation_distance(self.corr_matrix.values)
        dist_array = squareform(dist, checks=False)
        self.link = linkage(dist_array, method='ward')
        self.sort_ix = get_quasi_diag(self.link)

        cov_df = pd.DataFrame(self.cov_matrix.values, 
                              index=range(self.n_assets), 
                              columns=range(self.n_assets))
        hrp_w = hrp_allocation(cov_df, self.sort_ix)

        self.weights = np.zeros(self.n_assets)
        for i, idx in enumerate(self.sort_ix):
            self.weights[idx] = hrp_w.iloc[i]

        return self

    def get_weights(self) -> dict:
        return dict(zip(self.tickers, self.weights))

hrp = HierarchicalRiskParity(returns).fit()
print("HRP Weights:")
for ticker, w in hrp.get_weights().items():
    print(f"  {ticker}: {w*100:.2f}%")

Exercise 6.6: Complete Portfolio Allocator (Open-ended)

Your Task:

Build an AdvancedPortfolioAllocator class that: - Implements all techniques: Equal Weight, Risk Parity, HRP - Calculates performance statistics for each - Provides comparison methods - Includes visualization

Your implementation:

Exercise
Click to reveal solution
class AdvancedPortfolioAllocator:
    """Advanced portfolio allocation with multiple techniques."""

    def __init__(self, returns: pd.DataFrame):
        self.returns = returns
        self.tickers = list(returns.columns)
        self.n_assets = len(self.tickers)
        self.cov_matrix = returns.cov() * 252
        self.annual_returns = returns.mean() * 252
        self.results = {}

    def equal_weight(self):
        """Equal weight allocation."""
        self.results['Equal Weight'] = np.ones(self.n_assets) / self.n_assets
        return self

    def risk_parity(self):
        """Risk parity allocation."""
        self.results['Risk Parity'] = optimize_risk_parity(self.cov_matrix)
        return self

    def hrp(self):
        """HRP allocation."""
        hrp_model = HierarchicalRiskParity(self.returns).fit()
        self.results['HRP'] = hrp_model.weights
        return self

    def run_all(self):
        """Run all allocation methods."""
        return self.equal_weight().risk_parity().hrp()

    def get_stats(self, weights: np.ndarray) -> dict:
        """Get portfolio statistics."""
        ret = np.dot(weights, self.annual_returns)
        vol = np.sqrt(np.dot(weights.T, np.dot(self.cov_matrix, weights)))
        return {'return': ret, 'volatility': vol, 'sharpe': ret / vol}

    def compare(self) -> pd.DataFrame:
        """Compare all methods."""
        if not self.results:
            self.run_all()

        data = []
        for name, weights in self.results.items():
            stats = self.get_stats(weights)
            data.append({
                'Method': name,
                'Return': stats['return'],
                'Volatility': stats['volatility'],
                'Sharpe': stats['sharpe']
            })

        return pd.DataFrame(data).set_index('Method')

    def plot(self):
        """Plot weight comparison."""
        if not self.results:
            self.run_all()

        df = pd.DataFrame(self.results, index=self.tickers) * 100
        ax = df.plot(kind='bar', figsize=(12, 6), width=0.8)
        ax.set_ylabel('Weight (%)')
        ax.set_title('Portfolio Allocation Comparison')
        plt.xticks(rotation=0)
        plt.legend(loc='upper right')
        plt.tight_layout()
        plt.show()

# Test
allocator = AdvancedPortfolioAllocator(returns)
allocator.run_all()
print(allocator.compare().round(4))
allocator.plot()

Module Project: Advanced Portfolio Allocation System

Build a complete system implementing all advanced techniques.

Your Challenge:

  1. Implement Equal Weight, Risk Parity, and HRP
  2. Compare all methods on return, risk, and Sharpe
  3. Visualize weight allocations
  4. Analyze risk contribution for each method
  5. Summarize findings and recommendations
# YOUR CODE HERE - Module Project
Click to reveal solution
# Complete Advanced Allocation Project

# Step 1: Run all methods
methods = {
    'Equal Weight': equal_weights,
    'Risk Parity': rp_weights,
    'HRP': hrp_weights
}

# Step 2: Compare statistics
print("Method Comparison")
print("=" * 65)
print(f"{'Method':<15} {'Return':>10} {'Volatility':>12} {'Sharpe':>10}")
print("-" * 55)

for name, weights in methods.items():
    ret = np.dot(weights, annual_returns) * 100
    vol = portfolio_volatility(weights, cov_matrix) * 100
    sharpe = (ret / 100) / (vol / 100)
    print(f"{name:<15} {ret:>9.2f}% {vol:>11.2f}% {sharpe:>10.3f}")

# Step 3: Visualize allocations
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for ax, (name, weights) in zip(axes, methods.items()):
    ax.barh(tickers, weights * 100)
    ax.set_xlabel('Weight (%)')
    ax.set_title(name)
plt.tight_layout()
plt.show()

# Step 4: Risk contribution analysis
print("\nRisk Contribution Analysis")
print("=" * 60)
for name, weights in methods.items():
    rc = risk_contribution_pct(weights, cov_matrix)
    rc_std = np.std(rc)
    print(f"{name}: RC Std Dev = {rc_std:.2f}% (lower is more balanced)")

# Step 5: Summary
print("\nRecommendations:")
print("  - Use Risk Parity for balanced risk contribution")
print("  - Use HRP when you distrust return forecasts")
print("  - Equal Weight provides a simple baseline")

Key Takeaways

What You Learned

1. Risk Parity

  • Allocate so each asset contributes equal risk
  • Extends to risk budgeting for custom allocations
  • More intuitive than MVO

2. Black-Litterman

  • Combines market equilibrium with investor views
  • More stable than pure MVO
  • Widely used by institutions

3. Robust Optimization

  • Shrinkage reduces estimation error
  • Resampling provides weight uncertainty estimates

4. Hierarchical Risk Parity

  • No return estimates needed
  • Uses clustering for asset grouping
  • Often outperforms MVO out-of-sample

When to Use Each Method

Method Best For
Risk Parity Diversified multi-asset portfolios
Black-Litterman When you have specific market views
Resampling When uncertain about estimates
HRP When you distrust return forecasts

Coming Up Next

In Part 3: Risk Modeling, we'll dive into VaR, CVaR, stress testing, and factor models.


Congratulations on completing Module 6!

Module 7: Value at Risk (VaR)

Course 3: Quantitative Finance & Portfolio Theory
Part 3: Risk Modeling


Learning Objectives

By the end of this module, you will be able to:

  1. Calculate VaR using parametric, historical, and Monte Carlo methods
  2. Understand distributional assumptions behind each VaR approach
  3. Scale VaR across different time horizons
  4. Implement VaR backtesting and violation analysis
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 6: Advanced Portfolio Techniques

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Data

# Download portfolio data
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
end_date = datetime.now()
start_date = end_date - timedelta(days=10*365)

print("Downloading historical data...")
data = yf.download(tickers, start=start_date, end=end_date, progress=False)

# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.iloc[:, :len(tickers)]
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()

# Create a portfolio (60% equities, 40% defensive)
portfolio_weights = np.array([0.40, 0.20, 0.25, 0.15])
portfolio_returns = returns.dot(portfolio_weights)

print(f"\nData loaded: {len(prices)} trading days")
print(f"Portfolio: {dict(zip(tickers, portfolio_weights))}")

Section 7.1: VaR Fundamentals

Value at Risk (VaR) answers a simple but powerful question: "What is the maximum loss we might experience over a given period at a given confidence level?"

In this section, you will learn: - The definition and interpretation of VaR - Common confidence levels and time horizons - How to express VaR in percentage and dollar terms

7.1.1 VaR Definition

VaR at confidence level $\alpha$ is the $\alpha$-th percentile of the loss distribution:

$$P(Loss > VaR_{\alpha}) = 1 - \alpha$$

Common parameters: - Confidence level: 95% or 99% - Time horizon: 1 day, 10 days, 1 month - Position size: Dollar amount at risk

# Visualize the concept of VaR
fig, ax = plt.subplots(figsize=(12, 6))

# Histogram of returns
n, bins, patches = ax.hist(portfolio_returns * 100, bins=100, density=True, 
                           alpha=0.7, color='steelblue', edgecolor='black')

# Calculate 95% VaR
var_95 = np.percentile(portfolio_returns, 5) * 100

# Color the tail
for i in range(len(patches)):
    if bins[i] < var_95:
        patches[i].set_facecolor('red')
        patches[i].set_alpha(0.7)

# Add VaR line
ax.axvline(x=var_95, color='red', linestyle='--', linewidth=2, 
           label=f'95% VaR = {var_95:.2f}%')

ax.set_xlabel('Daily Return (%)', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Value at Risk: The 5% Worst-Case Threshold', fontsize=14, fontweight='bold')
ax.legend(loc='upper right')
ax.annotate('5% of days\nworse than VaR', xy=(var_95 - 1, 0.05), fontsize=10,
           ha='center', color='red')
plt.tight_layout()
plt.show()

print(f"\n95% Daily VaR: {var_95:.2f}%")
print(f"Interpretation: On 95% of days, losses will not exceed {abs(var_95):.2f}%")
# VaR in dollar terms
portfolio_value = 1_000_000  # $1 million portfolio

var_95_dollars = portfolio_value * abs(np.percentile(portfolio_returns, 5))
var_99_dollars = portfolio_value * abs(np.percentile(portfolio_returns, 1))

print(f"Portfolio Value: ${portfolio_value:,.0f}")
print(f"\n95% Daily VaR: ${var_95_dollars:,.0f}")
print(f"99% Daily VaR: ${var_99_dollars:,.0f}")

7.1.2 Time Scaling VaR

To convert daily VaR to different horizons (assuming i.i.d. returns):

$$VaR_T = VaR_1 \times \sqrt{T}$$

This is the square root of time rule.

# Scale VaR to different horizons
var_1d = abs(np.percentile(portfolio_returns, 5))

horizons = [1, 5, 10, 21, 63, 252]
horizon_names = ['1 Day', '1 Week', '2 Weeks', '1 Month', '1 Quarter', '1 Year']

print("95% VaR at Different Time Horizons")
print("=" * 50)
print(f"Portfolio Value: ${portfolio_value:,.0f}")
print()

for h, name in zip(horizons, horizon_names):
    var_h = var_1d * np.sqrt(h)
    var_h_dollars = portfolio_value * var_h
    print(f"{name:<12}: {var_h*100:>6.2f}%  (${var_h_dollars:>12,.0f})")

Section 7.2: Parametric VaR

Parametric VaR assumes returns follow a specific distribution (usually normal) and calculates VaR analytically.

In this section, you will learn: - Normal distribution VaR formula - Student-t distribution for fat tails - Cornish-Fisher adjustment for skewness and kurtosis

7.2.1 Normal Distribution VaR

For normally distributed returns:

$$VaR_{\alpha} = \mu - z_{\alpha} \cdot \sigma$$

Where: - $\mu$ = Mean return - $\sigma$ = Standard deviation - $z_{\alpha}$ = Standard normal quantile (1.645 for 95%, 2.326 for 99%)

def parametric_var_normal(returns: pd.Series, confidence: float = 0.95) -> tuple:
    """
    Calculate VaR assuming normal distribution.
    
    Args:
        returns: Daily return series
        confidence: Confidence level
    
    Returns:
        Tuple of (VaR, mean, std)
    """
    mu = returns.mean()
    sigma = returns.std()
    z = stats.norm.ppf(1 - confidence)
    var = -(mu + z * sigma)
    return var, mu, sigma

# Calculate parametric VaR
var_95_normal, mu, sigma = parametric_var_normal(portfolio_returns, 0.95)
var_99_normal, _, _ = parametric_var_normal(portfolio_returns, 0.99)

print("Parametric VaR (Normal Distribution)")
print("=" * 45)
print(f"Mean daily return: {mu*100:.4f}%")
print(f"Daily volatility: {sigma*100:.4f}%")
print(f"\n95% VaR: {var_95_normal*100:.2f}%")
print(f"99% VaR: {var_99_normal*100:.2f}%")

7.2.2 Student-t Distribution VaR

Financial returns often have fatter tails than normal. The Student-t distribution can capture this.

def parametric_var_t(returns: pd.Series, confidence: float = 0.95) -> tuple:
    """
    Calculate VaR using Student-t distribution.
    
    Args:
        returns: Daily return series
        confidence: Confidence level
    
    Returns:
        Tuple of (VaR, degrees_of_freedom, loc, scale)
    """
    params = stats.t.fit(returns)
    df, loc, scale = params
    var = -stats.t.ppf(1 - confidence, df, loc, scale)
    return var, df, loc, scale

var_95_t, df, loc, scale = parametric_var_t(portfolio_returns, 0.95)
var_99_t, _, _, _ = parametric_var_t(portfolio_returns, 0.99)

print("Parametric VaR (Student-t Distribution)")
print("=" * 45)
print(f"Degrees of freedom: {df:.2f}")
print(f"Location: {loc*100:.4f}%")
print(f"Scale: {scale*100:.4f}%")
print(f"\n95% VaR: {var_95_t*100:.2f}%")
print(f"99% VaR: {var_99_t*100:.2f}%")
print(f"\nNote: Lower degrees of freedom = fatter tails")

7.2.3 Cornish-Fisher Adjustment

The Cornish-Fisher expansion adjusts normal VaR for skewness and kurtosis:

$$z_{CF} = z + \frac{1}{6}(z^2 - 1)S + \frac{1}{24}(z^3 - 3z)(K-3) - \frac{1}{36}(2z^3 - 5z)S^2$$

def cornish_fisher_var(returns: pd.Series, confidence: float = 0.95) -> tuple:
    """
    VaR with Cornish-Fisher adjustment for skewness and kurtosis.
    
    Args:
        returns: Daily return series
        confidence: Confidence level
    
    Returns:
        Tuple of (VaR, skewness, excess_kurtosis)
    """
    mu = returns.mean()
    sigma = returns.std()
    skew = stats.skew(returns)
    kurt = stats.kurtosis(returns)
    
    z = stats.norm.ppf(1 - confidence)
    z_cf = (z + (z**2 - 1) * skew / 6 + 
            (z**3 - 3*z) * kurt / 24 - 
            (2*z**3 - 5*z) * skew**2 / 36)
    
    var = -(mu + z_cf * sigma)
    return var, skew, kurt

var_95_cf, skew, kurt = cornish_fisher_var(portfolio_returns, 0.95)
var_99_cf, _, _ = cornish_fisher_var(portfolio_returns, 0.99)

print("Cornish-Fisher VaR (Adjusted for Higher Moments)")
print("=" * 50)
print(f"Skewness: {skew:.4f}")
print(f"Excess Kurtosis: {kurt:.4f}")
print(f"\n95% VaR: {var_95_cf*100:.2f}%")
print(f"99% VaR: {var_99_cf*100:.2f}%")
# Compare all parametric methods
print("Parametric VaR Comparison")
print("=" * 55)
print(f"{'Method':<20} {'95% VaR':>12} {'99% VaR':>12}")
print("-" * 50)
print(f"{'Normal':<20} {var_95_normal*100:>11.2f}% {var_99_normal*100:>11.2f}%")
print(f"{'Student-t':<20} {var_95_t*100:>11.2f}% {var_99_t*100:>11.2f}%")
print(f"{'Cornish-Fisher':<20} {var_95_cf*100:>11.2f}% {var_99_cf*100:>11.2f}%")

Exercise 7.1: Portfolio Parametric VaR (Guided)

Your Task: Calculate the parametric VaR for a multi-asset portfolio using the covariance matrix.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def portfolio_parametric_var(returns: pd.DataFrame, 
                             weights: np.ndarray, 
                             confidence: float = 0.95) -> dict:
    cov_matrix = returns.cov()
    mean_returns = returns.mean()
    port_mean = np.dot(weights, mean_returns)
    port_variance = np.dot(weights, np.dot(cov_matrix, weights))
    port_std = np.sqrt(port_variance)

    z = stats.norm.ppf(1 - confidence)
    var = -(port_mean + z * port_std)

    return {
        'var': var,
        'port_mean': port_mean,
        'port_std': port_std
    }

result = portfolio_parametric_var(returns, portfolio_weights, 0.95)
print(f"Portfolio VaR (95%): {result['var']*100:.2f}%")
print(f"Portfolio Mean: {result['port_mean']*100:.4f}%")
print(f"Portfolio Std: {result['port_std']*100:.4f}%")

Section 7.3: Historical VaR

Historical simulation uses actual past returns—no distributional assumptions needed. The VaR is simply a percentile of historical returns.

In this section, you will learn: - Basic historical VaR calculation - Age-weighted historical VaR - Rolling VaR for time-varying risk

7.3.1 Basic Historical VaR

def historical_var(returns: pd.Series, confidence: float = 0.95) -> float:
    """Calculate VaR using historical simulation."""
    var = -np.percentile(returns, (1 - confidence) * 100)
    return var

# Calculate historical VaR
var_95_hist = historical_var(portfolio_returns, 0.95)
var_99_hist = historical_var(portfolio_returns, 0.99)

print("Historical Simulation VaR")
print("=" * 40)
print(f"95% VaR: {var_95_hist*100:.2f}%")
print(f"99% VaR: {var_99_hist*100:.2f}%")

# Show worst days
print(f"\nWorst 5 days in history:")
worst_days = portfolio_returns.nsmallest(5)
for date, ret in worst_days.items():
    print(f"  {date.strftime('%Y-%m-%d')}: {ret*100:.2f}%")

7.3.2 Age-Weighted Historical VaR

Recent data is often more relevant than older data. We can weight observations by recency using exponential decay.

def age_weighted_var(returns: pd.Series, 
                     confidence: float = 0.95, 
                     decay: float = 0.97) -> tuple:
    """
    Historical VaR with exponential age weighting.
    
    Args:
        returns: Return series
        confidence: Confidence level
        decay: Decay factor (lambda)
    
    Returns:
        Tuple of (VaR, weights)
    """
    n = len(returns)
    weights = np.array([decay ** i for i in range(n-1, -1, -1)])
    weights = weights / weights.sum()
    
    sorted_idx = np.argsort(returns)
    sorted_returns = returns.values[sorted_idx]
    sorted_weights = weights[sorted_idx]
    
    cumulative_weights = np.cumsum(sorted_weights)
    var_idx = np.searchsorted(cumulative_weights, 1 - confidence)
    var = -sorted_returns[var_idx]
    
    return var, weights

var_95_aw, weights = age_weighted_var(portfolio_returns, 0.95)
var_99_aw, _ = age_weighted_var(portfolio_returns, 0.99)

print("Age-Weighted Historical VaR (λ = 0.97)")
print("=" * 45)
print(f"95% VaR: {var_95_aw*100:.2f}%")
print(f"99% VaR: {var_99_aw*100:.2f}%")
print(f"\nComparison with equal-weighted:")
print(f"  Equal-weighted 95% VaR: {var_95_hist*100:.2f}%")
print(f"  Age-weighted 95% VaR: {var_95_aw*100:.2f}%")

7.3.3 Rolling Historical VaR

VaR should be monitored over time, not just calculated once.

# Calculate rolling VaR
window = 252  # 1 year

rolling_var_95 = portfolio_returns.rolling(window).apply(
    lambda x: -np.percentile(x, 5)
)
rolling_var_99 = portfolio_returns.rolling(window).apply(
    lambda x: -np.percentile(x, 1)
)

# Plot
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Top: Returns vs VaR
ax1 = axes[0]
ax1.plot(portfolio_returns.index, portfolio_returns * 100, 'gray', alpha=0.5, label='Daily Returns')
ax1.plot(rolling_var_95.index, -rolling_var_95 * 100, 'r-', linewidth=2, label='95% VaR Threshold')
ax1.fill_between(rolling_var_95.index, -rolling_var_95 * 100, -10, alpha=0.2, color='red')
ax1.set_ylabel('Return (%)')
ax1.set_title('Daily Returns vs Rolling 95% VaR', fontsize=12, fontweight='bold')
ax1.legend(loc='upper right')
ax1.set_ylim(-10, 10)

# Bottom: Rolling VaR levels
ax2 = axes[1]
ax2.plot(rolling_var_95.index, rolling_var_95 * 100, 'b-', linewidth=2, label='95% VaR')
ax2.plot(rolling_var_99.index, rolling_var_99 * 100, 'r-', linewidth=2, label='99% VaR')
ax2.set_xlabel('Date')
ax2.set_ylabel('VaR (%)')
ax2.set_title('Rolling VaR Over Time (252-day window)', fontsize=12, fontweight='bold')
ax2.legend()

plt.tight_layout()
plt.show()

Exercise 7.2: Weighted Historical VaR (Guided)

Your Task: Implement a function that calculates historical VaR with custom weighting schemes.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def weighted_historical_var(returns: pd.Series, 
                            weights: np.ndarray,
                            confidence: float = 0.95) -> float:
    weights = weights / weights.sum()
    sorted_idx = np.argsort(returns)
    sorted_returns = returns.values[sorted_idx]
    sorted_weights = weights[sorted_idx]
    cumulative = np.cumsum(sorted_weights)
    var_idx = np.searchsorted(cumulative, 1 - confidence)
    return -sorted_returns[var_idx]

# Test with equal weights
equal_weights = np.ones(len(portfolio_returns))
var_equal = weighted_historical_var(portfolio_returns, equal_weights, 0.95)
print(f"Equal-weighted VaR: {var_equal*100:.2f}%")

# Test with recency weights
recency_weights = np.arange(1, len(portfolio_returns) + 1)
var_recency = weighted_historical_var(portfolio_returns, recency_weights, 0.95)
print(f"Recency-weighted VaR: {var_recency*100:.2f}%")

Exercise 7.3: VaR Method Comparison (Open-ended)

Your Task:

Build a function that compares VaR estimates across multiple methods and confidence levels: - Calculate VaR using Normal, Student-t, Historical, and Cornish-Fisher methods - Compare results at both 95% and 99% confidence levels - Return results as a formatted DataFrame

Your implementation:

Exercise
Click to reveal solution
def compare_var_methods(returns: pd.Series, 
                        confidence_levels: list = [0.95, 0.99]) -> pd.DataFrame:
    """
    Compare VaR estimates across multiple methods.

    Args:
        returns: Return series
        confidence_levels: List of confidence levels

    Returns:
        DataFrame with VaR comparisons
    """
    results = []

    for conf in confidence_levels:
        # Normal VaR
        mu, sigma = returns.mean(), returns.std()
        z = stats.norm.ppf(1 - conf)
        var_normal = -(mu + z * sigma)

        # Student-t VaR
        df, loc, scale = stats.t.fit(returns)
        var_t = -stats.t.ppf(1 - conf, df, loc, scale)

        # Historical VaR
        var_hist = -np.percentile(returns, (1 - conf) * 100)

        # Cornish-Fisher VaR
        skew = stats.skew(returns)
        kurt = stats.kurtosis(returns)
        z_cf = (z + (z**2 - 1) * skew / 6 + 
                (z**3 - 3*z) * kurt / 24 - 
                (2*z**3 - 5*z) * skew**2 / 36)
        var_cf = -(mu + z_cf * sigma)

        results.append({
            'Confidence': f"{int(conf*100)}%",
            'Normal': f"{var_normal*100:.2f}%",
            'Student-t': f"{var_t*100:.2f}%",
            'Historical': f"{var_hist*100:.2f}%",
            'Cornish-Fisher': f"{var_cf*100:.2f}%"
        })

    return pd.DataFrame(results)

# Test
comparison = compare_var_methods(portfolio_returns)
print("VaR Method Comparison")
print("=" * 60)
print(comparison.to_string(index=False))

Section 7.4: Monte Carlo VaR

Monte Carlo simulation generates thousands of possible scenarios based on assumed distributions. This is flexible—we can model fat tails, correlations, and complex portfolios.

In this section, you will learn: - Basic Monte Carlo VaR with normal distribution - Monte Carlo with fat tails (Student-t) - Correlated multi-asset Monte Carlo

7.4.1 Basic Monte Carlo VaR

def monte_carlo_var(mu: float, 
                    sigma: float, 
                    confidence: float = 0.95, 
                    n_simulations: int = 10000, 
                    seed: int = 42) -> tuple:
    """
    Monte Carlo VaR assuming normal distribution.
    
    Args:
        mu: Mean return
        sigma: Standard deviation
        confidence: Confidence level
        n_simulations: Number of simulations
        seed: Random seed
    
    Returns:
        Tuple of (VaR, simulated_returns)
    """
    np.random.seed(seed)
    simulated_returns = np.random.normal(mu, sigma, n_simulations)
    var = -np.percentile(simulated_returns, (1 - confidence) * 100)
    return var, simulated_returns

# Calculate MC VaR
mu = portfolio_returns.mean()
sigma = portfolio_returns.std()

var_95_mc, sim_returns = monte_carlo_var(mu, sigma, 0.95)
var_99_mc, _ = monte_carlo_var(mu, sigma, 0.99)

print("Monte Carlo VaR (10,000 simulations)")
print("=" * 45)
print(f"95% VaR: {var_95_mc*100:.2f}%")
print(f"99% VaR: {var_99_mc*100:.2f}%")
# Visualize simulated distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Historical vs Simulated
ax1 = axes[0]
ax1.hist(portfolio_returns * 100, bins=50, density=True, alpha=0.7, label='Historical')
ax1.hist(sim_returns * 100, bins=50, density=True, alpha=0.5, label='MC Simulated')
ax1.axvline(x=-var_95_hist*100, color='blue', linestyle='--', label=f'Hist VaR: {var_95_hist*100:.2f}%')
ax1.axvline(x=-var_95_mc*100, color='orange', linestyle='--', label=f'MC VaR: {var_95_mc*100:.2f}%')
ax1.set_xlabel('Daily Return (%)')
ax1.set_ylabel('Density')
ax1.set_title('Historical vs Monte Carlo Distribution', fontweight='bold')
ax1.legend()

# Q-Q plot
ax2 = axes[1]
stats.probplot(portfolio_returns, dist="norm", plot=ax2)
ax2.set_title('Q-Q Plot: Returns vs Normal', fontweight='bold')

plt.tight_layout()
plt.show()

print("\nQ-Q plot shows fat tails: extreme returns exceed normal expectations.")

7.4.2 Monte Carlo with Fat Tails

def monte_carlo_var_t(returns: pd.Series, 
                      confidence: float = 0.95, 
                      n_simulations: int = 10000, 
                      seed: int = 42) -> tuple:
    """
    Monte Carlo VaR using fitted Student-t distribution.
    
    Args:
        returns: Historical returns for fitting
        confidence: Confidence level
        n_simulations: Number of simulations
        seed: Random seed
    
    Returns:
        Tuple of (VaR, simulated_returns, degrees_of_freedom)
    """
    np.random.seed(seed)
    df, loc, scale = stats.t.fit(returns)
    simulated = stats.t.rvs(df, loc=loc, scale=scale, size=n_simulations)
    var = -np.percentile(simulated, (1 - confidence) * 100)
    return var, simulated, df

var_95_mc_t, sim_t, df = monte_carlo_var_t(portfolio_returns, 0.95)
var_99_mc_t, _, _ = monte_carlo_var_t(portfolio_returns, 0.99)

print("Monte Carlo VaR with Student-t Distribution")
print("=" * 50)
print(f"Fitted degrees of freedom: {df:.2f}")
print(f"\n95% VaR: {var_95_mc_t*100:.2f}%")
print(f"99% VaR: {var_99_mc_t*100:.2f}%")
print(f"\nFat tails increase extreme risk estimates by {((var_99_mc_t - var_99_mc)/var_99_mc)*100:.1f}%")

7.4.3 Correlated Multi-Asset Monte Carlo

def multivariate_mc_var(returns: pd.DataFrame, 
                        weights: np.ndarray, 
                        confidence: float = 0.95, 
                        n_simulations: int = 10000, 
                        seed: int = 42) -> tuple:
    """
    Monte Carlo VaR for multi-asset portfolio with correlations.
    
    Args:
        returns: DataFrame of asset returns
        weights: Portfolio weights
        confidence: Confidence level
        n_simulations: Number of simulations
        seed: Random seed
    
    Returns:
        Tuple of (VaR, portfolio_returns, asset_returns)
    """
    np.random.seed(seed)
    mean_returns = returns.mean().values
    cov_matrix = returns.cov().values
    
    simulated_asset_returns = np.random.multivariate_normal(
        mean_returns, cov_matrix, n_simulations
    )
    
    simulated_portfolio_returns = simulated_asset_returns @ weights
    var = -np.percentile(simulated_portfolio_returns, (1 - confidence) * 100)
    
    return var, simulated_portfolio_returns, simulated_asset_returns

var_95_multi, sim_port, sim_assets = multivariate_mc_var(returns, portfolio_weights, 0.95)
var_99_multi, _, _ = multivariate_mc_var(returns, portfolio_weights, 0.99)

print("Multi-Asset Monte Carlo VaR (with Correlations)")
print("=" * 50)
print(f"Portfolio: {dict(zip(tickers, portfolio_weights))}")
print(f"\n95% VaR: {var_95_multi*100:.2f}%")
print(f"99% VaR: {var_99_multi*100:.2f}%")
# Compare all VaR methods
print("\nVaR Method Comparison")
print("=" * 60)
print(f"{'Method':<25} {'95% VaR':>12} {'99% VaR':>12}")
print("-" * 55)
print(f"{'Parametric (Normal)':<25} {var_95_normal*100:>11.2f}% {var_99_normal*100:>11.2f}%")
print(f"{'Parametric (Student-t)':<25} {var_95_t*100:>11.2f}% {var_99_t*100:>11.2f}%")
print(f"{'Parametric (CF)':<25} {var_95_cf*100:>11.2f}% {var_99_cf*100:>11.2f}%")
print(f"{'Historical':<25} {var_95_hist*100:>11.2f}% {var_99_hist*100:>11.2f}%")
print(f"{'Age-Weighted Historical':<25} {var_95_aw*100:>11.2f}% {var_99_aw*100:>11.2f}%")
print(f"{'Monte Carlo (Normal)':<25} {var_95_mc*100:>11.2f}% {var_99_mc*100:>11.2f}%")
print(f"{'Monte Carlo (t-dist)':<25} {var_95_mc_t*100:>11.2f}% {var_99_mc_t*100:>11.2f}%")
print(f"{'MC Multi-Asset':<25} {var_95_multi*100:>11.2f}% {var_99_multi*100:>11.2f}%")

Exercise 7.4: Monte Carlo Simulation Engine (Guided)

Your Task: Build a Monte Carlo simulator that can use different distribution assumptions.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def monte_carlo_engine(returns: pd.Series,
                       n_simulations: int = 10000,
                       distribution: str = 'normal',
                       seed: int = 42) -> np.ndarray:
    np.random.seed(seed)

    if distribution == 'normal':
        mu = returns.mean()
        sigma = returns.std()
        simulated = np.random.normal(mu, sigma, n_simulations)

    elif distribution == 't':
        df, loc, scale = stats.t.fit(returns)
        simulated = stats.t.rvs(df, loc=loc, scale=scale, size=n_simulations)

    return simulated

# Test
sim_normal = monte_carlo_engine(portfolio_returns, distribution='normal')
sim_t = monte_carlo_engine(portfolio_returns, distribution='t')

print("Monte Carlo Results")
print(f"Normal 95% VaR: {-np.percentile(sim_normal, 5)*100:.2f}%")
print(f"Normal 99% VaR: {-np.percentile(sim_normal, 1)*100:.2f}%")
print(f"Student-t 95% VaR: {-np.percentile(sim_t, 5)*100:.2f}%")
print(f"Student-t 99% VaR: {-np.percentile(sim_t, 1)*100:.2f}%")

Exercise 7.5: VaR Backtesting (Open-ended)

Your Task:

Build a VaR backtesting function that: - Calculates rolling VaR over a specified window - Counts how many times actual losses exceeded VaR (violations) - Compares violation rate to expected rate - Returns detailed statistics and violation dates

Your implementation:

Exercise
Click to reveal solution
def backtest_var(returns: pd.Series,
                 window: int = 252,
                 confidence: float = 0.95,
                 method: str = 'historical') -> dict:
    """
    Backtest VaR model by comparing predictions to actual losses.

    Args:
        returns: Return series
        window: Rolling window for VaR calculation
        confidence: Confidence level
        method: 'historical' or 'parametric'

    Returns:
        Dictionary with backtest results
    """
    # Calculate rolling VaR
    if method == 'historical':
        rolling_var = returns.rolling(window).apply(
            lambda x: -np.percentile(x, (1-confidence)*100)
        )
    else:  # parametric
        def calc_param_var(x):
            mu, sigma = x.mean(), x.std()
            z = stats.norm.ppf(1 - confidence)
            return -(mu + z * sigma)
        rolling_var = returns.rolling(window).apply(calc_param_var)

    rolling_var = rolling_var.dropna()
    aligned_returns = returns.loc[rolling_var.index]

    # Find violations (actual loss > VaR)
    violations = aligned_returns < -rolling_var
    violation_dates = violations[violations].index

    # Statistics
    total_days = len(rolling_var)
    violation_count = violations.sum()
    expected_rate = 1 - confidence
    actual_rate = violation_count / total_days

    # Kupiec test (binomial test)
    p_value = stats.binom_test(violation_count, total_days, expected_rate,
                               alternative='two-sided')

    return {
        'total_days': total_days,
        'violations': violation_count,
        'expected_violations': int(total_days * expected_rate),
        'expected_rate': expected_rate,
        'actual_rate': actual_rate,
        'p_value': p_value,
        'model_valid': p_value > 0.05,
        'violation_dates': violation_dates,
        'rolling_var': rolling_var
    }

# Run backtest
results = backtest_var(portfolio_returns, window=252, confidence=0.95)

print("VaR Backtest Results")
print("=" * 50)
print(f"Total days tested: {results['total_days']}")
print(f"Expected violations (5%): {results['expected_violations']}")
print(f"Actual violations: {results['violations']}")
print(f"Actual rate: {results['actual_rate']*100:.2f}%")
print(f"P-value (Kupiec test): {results['p_value']:.4f}")
print(f"Model valid (p > 0.05): {results['model_valid']}")

Exercise 7.6: Complete VaR Analysis System (Open-ended)

Your Task:

Build a comprehensive VaR class that includes: - Multiple calculation methods (parametric, historical, Monte Carlo) - Time scaling to different horizons - Dollar VaR conversion for a given portfolio value - Backtesting capability - Summary report method

Your implementation:

Exercise
Click to reveal solution
class VaRAnalyzer:
    """
    Comprehensive Value at Risk analyzer.

    Supports multiple calculation methods, time scaling,
    and backtesting capabilities.
    """

    def __init__(self, returns: pd.Series, portfolio_value: float = 1_000_000):
        self.returns = returns
        self.portfolio_value = portfolio_value
        self.mu = returns.mean()
        self.sigma = returns.std()
        self.results = {}

    def parametric_normal(self, confidence: float = 0.95) -> float:
        z = stats.norm.ppf(1 - confidence)
        var = -(self.mu + z * self.sigma)
        self.results[f'normal_{int(confidence*100)}'] = var
        return var

    def parametric_t(self, confidence: float = 0.95) -> float:
        params = stats.t.fit(self.returns)
        var = -stats.t.ppf(1 - confidence, *params)
        self.results[f't_{int(confidence*100)}'] = var
        return var

    def historical(self, confidence: float = 0.95) -> float:
        var = -np.percentile(self.returns, (1 - confidence) * 100)
        self.results[f'hist_{int(confidence*100)}'] = var
        return var

    def monte_carlo(self, confidence: float = 0.95, 
                    n_sims: int = 10000) -> float:
        np.random.seed(42)
        sims = np.random.normal(self.mu, self.sigma, n_sims)
        var = -np.percentile(sims, (1 - confidence) * 100)
        self.results[f'mc_{int(confidence*100)}'] = var
        return var

    def scale_var(self, daily_var: float, horizon: int) -> float:
        """Scale daily VaR to different time horizon."""
        return daily_var * np.sqrt(horizon)

    def dollar_var(self, var_pct: float) -> float:
        """Convert percentage VaR to dollar amount."""
        return self.portfolio_value * var_pct

    def calculate_all(self, confidence: float = 0.95) -> dict:
        return {
            'normal': self.parametric_normal(confidence),
            't': self.parametric_t(confidence),
            'historical': self.historical(confidence),
            'monte_carlo': self.monte_carlo(confidence)
        }

    def backtest(self, window: int = 252, confidence: float = 0.95) -> dict:
        rolling_var = self.returns.rolling(window).apply(
            lambda x: -np.percentile(x, (1-confidence)*100)
        ).dropna()

        aligned = self.returns.loc[rolling_var.index]
        violations = (aligned < -rolling_var).sum()
        expected = len(rolling_var) * (1 - confidence)

        return {
            'violations': violations,
            'expected': int(expected),
            'rate': violations / len(rolling_var)
        }

    def summary(self):
        print("\n" + "=" * 60)
        print("VaR ANALYSIS SUMMARY")
        print("=" * 60)
        print(f"Portfolio Value: ${self.portfolio_value:,.0f}")

        for conf in [0.95, 0.99]:
            results = self.calculate_all(conf)
            print(f"\n{int(conf*100)}% Confidence Level:")
            print("-" * 40)
            for method, var in results.items():
                dollar = self.dollar_var(var)
                print(f"  {method:<15}: {var*100:>6.2f}% (${dollar:>12,.0f})")

        bt = self.backtest()
        print(f"\nBacktest Results:")
        print(f"  Violations: {bt['violations']} (expected: {bt['expected']})")
        print(f"  Rate: {bt['rate']*100:.2f}%")

# Test
analyzer = VaRAnalyzer(portfolio_returns, portfolio_value=1_000_000)
analyzer.summary()

Module Project: Production VaR Risk System

Build a comprehensive VaR calculation and monitoring system suitable for production use.

# YOUR CODE HERE - Module Project
Click to reveal solution
class ProductionVaRSystem:
    """
    Production-ready Value at Risk system.

    Features:
    - Multiple VaR calculation methods
    - Portfolio-level and asset-level analysis
    - Backtesting and violation tracking
    - Time horizon scaling
    - Comprehensive reporting
    """

    def __init__(self, returns: pd.DataFrame, 
                 weights: np.ndarray = None,
                 portfolio_value: float = 1_000_000):
        """
        Initialize VaR system.

        Args:
            returns: DataFrame of asset returns
            weights: Portfolio weights (equal if None)
            portfolio_value: Dollar value of portfolio
        """
        self.returns = returns
        self.assets = list(returns.columns)
        self.weights = weights if weights is not None else \
                       np.ones(len(self.assets)) / len(self.assets)
        self.portfolio_value = portfolio_value

        # Calculate portfolio returns
        self.portfolio_returns = returns.dot(self.weights)

        # Store results
        self.var_results = {}

    def parametric_var(self, confidence: float = 0.95, 
                       distribution: str = 'normal') -> dict:
        """Calculate parametric VaR."""
        ret = self.portfolio_returns

        if distribution == 'normal':
            mu, sigma = ret.mean(), ret.std()
            z = stats.norm.ppf(1 - confidence)
            var = -(mu + z * sigma)
        elif distribution == 't':
            params = stats.t.fit(ret)
            var = -stats.t.ppf(1 - confidence, *params)
        elif distribution == 'cornish_fisher':
            mu, sigma = ret.mean(), ret.std()
            skew = stats.skew(ret)
            kurt = stats.kurtosis(ret)
            z = stats.norm.ppf(1 - confidence)
            z_cf = (z + (z**2 - 1) * skew / 6 + 
                    (z**3 - 3*z) * kurt / 24 - 
                    (2*z**3 - 5*z) * skew**2 / 36)
            var = -(mu + z_cf * sigma)

        self.var_results[f'parametric_{distribution}_{int(confidence*100)}'] = var
        return {'var': var, 'dollar_var': var * self.portfolio_value}

    def historical_var(self, confidence: float = 0.95,
                       weighted: bool = False,
                       decay: float = 0.97) -> dict:
        """Calculate historical VaR."""
        ret = self.portfolio_returns

        if not weighted:
            var = -np.percentile(ret, (1 - confidence) * 100)
        else:
            n = len(ret)
            weights = np.array([decay ** i for i in range(n-1, -1, -1)])
            weights = weights / weights.sum()

            sorted_idx = np.argsort(ret)
            sorted_returns = ret.values[sorted_idx]
            sorted_weights = weights[sorted_idx]
            cumulative = np.cumsum(sorted_weights)
            var_idx = np.searchsorted(cumulative, 1 - confidence)
            var = -sorted_returns[var_idx]

        method = 'historical_weighted' if weighted else 'historical'
        self.var_results[f'{method}_{int(confidence*100)}'] = var
        return {'var': var, 'dollar_var': var * self.portfolio_value}

    def monte_carlo_var(self, confidence: float = 0.95,
                        n_sims: int = 10000,
                        multivariate: bool = False) -> dict:
        """Calculate Monte Carlo VaR."""
        np.random.seed(42)

        if not multivariate:
            mu = self.portfolio_returns.mean()
            sigma = self.portfolio_returns.std()
            sims = np.random.normal(mu, sigma, n_sims)
        else:
            mean_returns = self.returns.mean().values
            cov_matrix = self.returns.cov().values
            asset_sims = np.random.multivariate_normal(
                mean_returns, cov_matrix, n_sims
            )
            sims = asset_sims @ self.weights

        var = -np.percentile(sims, (1 - confidence) * 100)

        method = 'mc_multivariate' if multivariate else 'mc_univariate'
        self.var_results[f'{method}_{int(confidence*100)}'] = var
        return {'var': var, 'dollar_var': var * self.portfolio_value}

    def scale_to_horizon(self, daily_var: float, horizon: int) -> float:
        """Scale daily VaR to different time horizon."""
        return daily_var * np.sqrt(horizon)

    def calculate_all(self, confidence: float = 0.95) -> pd.DataFrame:
        """Calculate VaR using all methods."""
        results = []

        # Parametric methods
        for dist in ['normal', 't', 'cornish_fisher']:
            res = self.parametric_var(confidence, dist)
            results.append({
                'Method': f'Parametric ({dist})',
                'VaR (%)': res['var'] * 100,
                'Dollar VaR': res['dollar_var']
            })

        # Historical methods
        for weighted in [False, True]:
            res = self.historical_var(confidence, weighted)
            name = 'Historical (weighted)' if weighted else 'Historical'
            results.append({
                'Method': name,
                'VaR (%)': res['var'] * 100,
                'Dollar VaR': res['dollar_var']
            })

        # Monte Carlo methods
        for multi in [False, True]:
            res = self.monte_carlo_var(confidence, multivariate=multi)
            name = 'Monte Carlo (multi)' if multi else 'Monte Carlo'
            results.append({
                'Method': name,
                'VaR (%)': res['var'] * 100,
                'Dollar VaR': res['dollar_var']
            })

        return pd.DataFrame(results)

    def backtest(self, window: int = 252, confidence: float = 0.95) -> dict:
        """Backtest VaR model."""
        rolling_var = self.portfolio_returns.rolling(window).apply(
            lambda x: -np.percentile(x, (1 - confidence) * 100)
        ).dropna()

        aligned = self.portfolio_returns.loc[rolling_var.index]
        violations = aligned < -rolling_var
        violation_count = violations.sum()
        total_days = len(rolling_var)
        expected = total_days * (1 - confidence)

        return {
            'total_days': total_days,
            'violations': violation_count,
            'expected': int(expected),
            'rate': violation_count / total_days,
            'expected_rate': 1 - confidence,
            'pass': abs(violation_count - expected) < 2 * np.sqrt(expected)
        }

    def report(self):
        """Generate comprehensive VaR report."""
        print("\n" + "=" * 70)
        print("PRODUCTION VaR RISK REPORT")
        print("=" * 70)
        print(f"\nPortfolio Value: ${self.portfolio_value:,.0f}")
        print(f"Assets: {self.assets}")
        print(f"Weights: {dict(zip(self.assets, self.weights))}")

        # VaR at multiple confidence levels
        for conf in [0.95, 0.99]:
            print(f"\n{'='*70}")
            print(f"{int(conf*100)}% VALUE AT RISK")
            print("=" * 70)

            df = self.calculate_all(conf)
            df['Dollar VaR'] = df['Dollar VaR'].apply(lambda x: f"${x:,.0f}")
            df['VaR (%)'] = df['VaR (%)'].apply(lambda x: f"{x:.2f}%")
            print(df.to_string(index=False))

        # Time horizon scaling
        print(f"\n{'='*70}")
        print("TIME HORIZON SCALING (95% Historical VaR)")
        print("=" * 70)
        base_var = -np.percentile(self.portfolio_returns, 5)
        horizons = {'1 Day': 1, '1 Week': 5, '2 Weeks': 10, 
                    '1 Month': 21, '1 Quarter': 63}
        for name, days in horizons.items():
            scaled = self.scale_to_horizon(base_var, days)
            dollar = scaled * self.portfolio_value
            print(f"  {name:<12}: {scaled*100:>6.2f}% (${dollar:>12,.0f})")

        # Backtest
        print(f"\n{'='*70}")
        print("BACKTEST RESULTS")
        print("=" * 70)
        bt = self.backtest()
        print(f"  Total days tested: {bt['total_days']}")
        print(f"  Expected violations (5%): {bt['expected']}")
        print(f"  Actual violations: {bt['violations']}")
        print(f"  Violation rate: {bt['rate']*100:.2f}%")
        print(f"  Model status: {'PASS' if bt['pass'] else 'FAIL'}")

# Test the production system
system = ProductionVaRSystem(
    returns=returns,
    weights=portfolio_weights,
    portfolio_value=1_000_000
)
system.report()

Key Takeaways

What You Learned

1. VaR Fundamentals

  • VaR measures maximum expected loss at a given confidence level
  • Common parameters: 95%/99% confidence, 1-day horizon
  • Square root of time rule for scaling horizons

2. Parametric VaR

  • Normal: $VaR = \mu - z_\alpha \sigma$
  • Student-t captures fat tails
  • Cornish-Fisher adjusts for skewness and kurtosis

3. Historical VaR

  • No distributional assumptions required
  • Age-weighted for recency bias
  • Rolling VaR for time-varying risk

4. Monte Carlo VaR

  • Flexible for complex portfolios
  • Can incorporate correlations and fat tails
  • Multivariate simulation for portfolio risk

VaR Limitations

  • Doesn't tell you how bad losses can be beyond VaR
  • Not sub-additive (can violate diversification)
  • Assumes historical patterns continue

Coming Up Next

In Module 8: Beyond VaR, we'll explore: - Expected Shortfall (CVaR) for tail risk - Stress testing with historical and hypothetical scenarios - Drawdown analysis and duration risk - Advanced tail risk measures


Congratulations on completing Module 7!

Module 8: Beyond VaR

Course 3: Quantitative Finance & Portfolio Theory
Part 3: Risk Modeling


Learning Objectives

By the end of this module, you will be able to:

  1. Calculate Expected Shortfall (CVaR) and understand its advantages over VaR
  2. Design and implement historical and hypothetical stress tests
  3. Analyze drawdowns and duration risk
  4. Apply advanced tail risk measures including Omega and Sortino ratios
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 7: Value at Risk

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
from scipy.optimize import minimize
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: f'{x:.4f}')
np.set_printoptions(precision=4)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Data

# Download portfolio data
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
data = yf.download(tickers, start='2006-01-01', end='2024-01-01', progress=False)

# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.iloc[:, :len(tickers)]
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

prices.columns = [str(col) for col in prices.columns]
returns = prices.pct_change().dropna()

# Portfolio weights
weights = np.array([0.40, 0.20, 0.25, 0.15])
portfolio_returns = returns.dot(weights)

print(f"Data loaded: {returns.index[0].strftime('%Y-%m-%d')} to {returns.index[-1].strftime('%Y-%m-%d')}")
print(f"Assets: {list(returns.columns)}")
print(f"Total observations: {len(returns)}")

Section 8.1: Expected Shortfall (CVaR)

VaR tells you the threshold loss at a given confidence level, but it says nothing about how bad things can get when you exceed that threshold. Expected Shortfall (ES), also called Conditional VaR (CVaR), addresses this limitation.

In this section, you will learn: - The definition and interpretation of Expected Shortfall - Why regulators prefer ES over VaR - Multiple calculation methods for ES

8.1.1 Expected Shortfall Definition

Expected Shortfall at confidence level $\alpha$ measures the expected loss given that we've exceeded VaR:

$$ES_{\alpha} = E[L | L > VaR_{\alpha}]$$

In plain English: "When bad days happen, how bad are they on average?"

def calculate_var_es(returns: pd.Series, confidence: float = 0.95) -> tuple:
    """
    Calculate VaR and Expected Shortfall using historical simulation.
    
    Args:
        returns: Historical returns
        confidence: Confidence level (e.g., 0.95 for 95%)
    
    Returns:
        Tuple of (VaR, ES) both as positive numbers representing losses
    """
    returns_arr = np.array(returns)
    alpha = 1 - confidence
    
    # VaR is the alpha quantile of returns
    var = -np.percentile(returns_arr, alpha * 100)
    
    # ES is the average of returns worse than VaR
    threshold = np.percentile(returns_arr, alpha * 100)
    tail_returns = returns_arr[returns_arr <= threshold]
    es = -np.mean(tail_returns)
    
    return var, es

# Calculate for SPY
spy_returns = returns['SPY']
var_95, es_95 = calculate_var_es(spy_returns, 0.95)
var_99, es_99 = calculate_var_es(spy_returns, 0.99)

print("SPY Risk Measures (Historical)")
print("=" * 40)
print(f"\n95% Confidence:")
print(f"  VaR:               {var_95*100:.2f}%")
print(f"  Expected Shortfall: {es_95*100:.2f}%")
print(f"  ES/VaR Ratio:      {es_95/var_95:.2f}x")
print(f"\n99% Confidence:")
print(f"  VaR:               {var_99*100:.2f}%")
print(f"  Expected Shortfall: {es_99*100:.2f}%")
print(f"  ES/VaR Ratio:      {es_99/var_99:.2f}x")
# Visualize the difference between VaR and ES
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: Distribution with VaR and ES
ax1 = axes[0]
var_threshold = np.percentile(spy_returns, 5)

n, bins, patches = ax1.hist(spy_returns, bins=100, density=True, alpha=0.7, 
                            color='steelblue', edgecolor='white')

# Color the tail
for i, (patch, left_edge) in enumerate(zip(patches, bins[:-1])):
    if left_edge < var_threshold:
        patch.set_facecolor('crimson')

ax1.axvline(-var_95, color='orange', linewidth=2, linestyle='--', 
           label=f'95% VaR: {var_95*100:.2f}%')
ax1.axvline(-es_95, color='darkred', linewidth=2, linestyle='-', 
           label=f'95% ES: {es_95*100:.2f}%')
ax1.set_xlabel('Daily Return')
ax1.set_ylabel('Density')
ax1.set_title('VaR vs Expected Shortfall\n(Red area = Tail losses beyond VaR)')
ax1.legend()
ax1.set_xlim(-0.12, 0.12)

# Right plot: Tail losses only
ax2 = axes[1]
tail_losses = -spy_returns[spy_returns < var_threshold]
ax2.hist(tail_losses * 100, bins=30, alpha=0.7, color='crimson', edgecolor='white')
ax2.axvline(var_95 * 100, color='orange', linewidth=2, linestyle='--', 
           label=f'VaR: {var_95*100:.2f}%')
ax2.axvline(es_95 * 100, color='darkred', linewidth=2, linestyle='-', 
           label=f'ES (avg): {es_95*100:.2f}%')
ax2.set_xlabel('Loss (%)')
ax2.set_ylabel('Frequency')
ax2.set_title(f'Distribution of Tail Losses\n({len(tail_losses)} observations beyond VaR)')
ax2.legend()

plt.tight_layout()
plt.show()

print(f"\nKey Insight: When losses exceed VaR ({var_95*100:.2f}%), they average {es_95*100:.2f}%")

8.1.2 Why Regulators Prefer Expected Shortfall

The Basel Committee shifted from VaR to ES for several reasons:

Property VaR Expected Shortfall
Captures tail risk No Yes
Coherent risk measure No Yes
Sub-additive No Yes
Penalizes concentration No Yes

Sub-additivity is crucial: $Risk(A + B) \leq Risk(A) + Risk(B)$

# Demonstrate sub-additivity with portfolio diversification
print("Sub-Additivity Test: Diversification Benefits")
print("=" * 50)

# Individual asset risk measures
results = []
for asset in returns.columns:
    var, es = calculate_var_es(returns[asset], 0.95)
    results.append({'Asset': asset, 'VaR_95': var, 'ES_95': es})
    print(f"{asset}: VaR = {var*100:.2f}%, ES = {es*100:.2f}%")

# Equal-weighted portfolio
port_returns = returns.mean(axis=1)
port_var, port_es = calculate_var_es(port_returns, 0.95)

# Average of individual risks
df_results = pd.DataFrame(results)
avg_var = df_results['VaR_95'].mean()
avg_es = df_results['ES_95'].mean()

print(f"\n{'='*50}")
print(f"Portfolio (equal-weight):")
print(f"  VaR:  {port_var*100:.2f}%  (vs avg individual: {avg_var*100:.2f}%)")
print(f"  ES:   {port_es*100:.2f}%  (vs avg individual: {avg_es*100:.2f}%)")
print(f"\nDiversification Benefit:")
print(f"  VaR reduction: {(1 - port_var/avg_var)*100:.1f}%")
print(f"  ES reduction:  {(1 - port_es/avg_es)*100:.1f}%")

8.1.3 Parametric Expected Shortfall

For normally distributed returns, ES has a closed-form solution:

$$ES_{\alpha} = \mu + \sigma \cdot \frac{\phi(z_{\alpha})}{1-\alpha}$$

def parametric_es(returns: pd.Series, 
                  confidence: float = 0.95, 
                  distribution: str = 'normal') -> tuple:
    """
    Calculate Expected Shortfall using parametric methods.
    
    Args:
        returns: Historical returns for parameter estimation
        confidence: Confidence level
        distribution: 'normal' or 't' for Student-t
    
    Returns:
        Tuple of (VaR, ES)
    """
    returns_arr = np.array(returns)
    mu = np.mean(returns_arr)
    sigma = np.std(returns_arr)
    alpha = 1 - confidence
    
    if distribution == 'normal':
        z_alpha = stats.norm.ppf(alpha)
        var = -(mu + sigma * z_alpha)
        es = -mu + sigma * stats.norm.pdf(z_alpha) / alpha
    else:  # Student-t
        df, loc, scale = stats.t.fit(returns_arr)
        t_alpha = stats.t.ppf(alpha, df)
        var = -(loc + scale * t_alpha)
        es = -loc + scale * (stats.t.pdf(t_alpha, df) / alpha) * (df + t_alpha**2) / (df - 1)
    
    return var, es

# Compare methods
print("Expected Shortfall Comparison (SPY, 95% confidence)")
print("=" * 55)

var_hist, es_hist = calculate_var_es(spy_returns, 0.95)
var_norm, es_norm = parametric_es(spy_returns, 0.95, 'normal')
var_t, es_t = parametric_es(spy_returns, 0.95, 't')

comparison = pd.DataFrame({
    'Method': ['Historical', 'Normal', 'Student-t'],
    'VaR (%)': [var_hist*100, var_norm*100, var_t*100],
    'ES (%)': [es_hist*100, es_norm*100, es_t*100],
    'ES/VaR': [es_hist/var_hist, es_norm/var_norm, es_t/var_t]
})
print(comparison.to_string(index=False))

Exercise 8.1: Multi-Asset Expected Shortfall (Guided)

Your Task: Calculate and compare VaR and ES for all assets in the dataset. Find which asset has the highest ES/VaR ratio (fattest tail).

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def analyze_tail_risk(returns: pd.DataFrame, confidence: float = 0.95) -> pd.DataFrame:
    results = []

    for asset in returns.columns:
        asset_returns = returns[asset]
        alpha = 1 - confidence

        var = -np.percentile(asset_returns, alpha * 100)

        threshold = np.percentile(asset_returns, alpha * 100)
        tail_returns = asset_returns[asset_returns <= threshold]

        es = -tail_returns.mean()

        results.append({
            'Asset': asset,
            'VaR': var,
            'ES': es,
            'ES_VaR_Ratio': es / var
        })

    return pd.DataFrame(results).sort_values('ES_VaR_Ratio', ascending=False)

risk_analysis = analyze_tail_risk(returns, 0.95)
print("Tail Risk Analysis (95% Confidence)")
print("=" * 50)
for _, row in risk_analysis.iterrows():
    print(f"{row['Asset']}: VaR={row['VaR']*100:.2f}%, ES={row['ES']*100:.2f}%, Ratio={row['ES_VaR_Ratio']:.2f}")
print(f"\nFattest tail: {risk_analysis.iloc[0]['Asset']}")

Section 8.2: Stress Testing

VaR and ES estimate risk based on historical patterns. But what about unprecedented events? Stress testing evaluates portfolio performance under extreme scenarios.

In this section, you will learn: - Historical scenario analysis using past crises - Hypothetical stress testing for unprecedented events - Sensitivity analysis for factor changes

8.2.1 Historical Scenario Analysis

# Define historical crisis periods
crisis_periods = {
    'Global Financial Crisis': ('2008-09-01', '2008-11-30'),
    'Flash Crash 2010': ('2010-05-01', '2010-05-31'),
    'Euro Debt Crisis': ('2011-08-01', '2011-10-31'),
    'China Deval 2015': ('2015-08-01', '2015-09-30'),
    'COVID Crash': ('2020-02-15', '2020-03-31'),
    'Rate Hike 2022': ('2022-01-01', '2022-06-30')
}

def analyze_crisis_period(returns_df: pd.DataFrame, 
                          start_date: str, 
                          end_date: str) -> dict:
    """Calculate risk metrics during a crisis period."""
    crisis_returns = returns_df.loc[start_date:end_date]
    
    if len(crisis_returns) == 0:
        return None
    
    cumulative = (1 + crisis_returns).prod() - 1
    worst_day = crisis_returns.min()
    vol = crisis_returns.std() * np.sqrt(252)
    
    return {
        'days': len(crisis_returns),
        'cumulative': cumulative,
        'worst_day': worst_day,
        'annualized_vol': vol
    }

# Analyze each crisis for SPY
print("Historical Crisis Analysis: SPY")
print("=" * 70)

crisis_results = []
for crisis_name, (start, end) in crisis_periods.items():
    result = analyze_crisis_period(returns['SPY'], start, end)
    if result:
        crisis_results.append({
            'Crisis': crisis_name,
            'Days': result['days'],
            'Total Return': f"{result['cumulative']*100:.1f}%",
            'Worst Day': f"{result['worst_day']*100:.1f}%",
            'Ann. Vol': f"{result['annualized_vol']*100:.0f}%"
        })

df_crisis = pd.DataFrame(crisis_results)
print(df_crisis.to_string(index=False))
# Portfolio stress test across multiple allocations
def stress_test_portfolio(weights: dict, 
                          returns_df: pd.DataFrame, 
                          crisis_periods: dict) -> pd.DataFrame:
    """
    Apply historical crisis scenarios to a portfolio.
    
    Args:
        weights: Asset weights dict
        returns_df: DataFrame of historical returns
        crisis_periods: Dict of crisis name -> (start, end)
    
    Returns:
        DataFrame with stress test results
    """
    results = []
    
    assets = [a for a in weights.keys() if a in returns_df.columns]
    w = np.array([weights[a] for a in assets])
    w = w / w.sum()
    port_returns = pd.Series(returns_df[assets].values @ w, index=returns_df.index)
    
    for crisis_name, (start, end) in crisis_periods.items():
        crisis_ret = port_returns.loc[start:end]
        if len(crisis_ret) > 0:
            total_return = (1 + crisis_ret).prod() - 1
            max_dd = (crisis_ret.cumsum() - crisis_ret.cumsum().cummax()).min()
            worst_day = crisis_ret.min()
            
            results.append({
                'Scenario': crisis_name,
                'Portfolio Return': total_return,
                'Worst Day': worst_day,
                'Max Drawdown': max_dd
            })
    
    return pd.DataFrame(results)

# Define test portfolios
portfolios = {
    'Aggressive': {'SPY': 0.7, 'QQQ': 0.3, 'TLT': 0.0, 'GLD': 0.0},
    'Balanced': {'SPY': 0.4, 'QQQ': 0.2, 'TLT': 0.3, 'GLD': 0.1},
    'Conservative': {'SPY': 0.2, 'QQQ': 0.1, 'TLT': 0.5, 'GLD': 0.2}
}

print("Portfolio Stress Test Results")
print("=" * 70)

for port_name, port_weights in portfolios.items():
    print(f"\n{port_name} Portfolio:")
    results = stress_test_portfolio(port_weights, returns, crisis_periods)
    results['Portfolio Return'] = results['Portfolio Return'].apply(lambda x: f"{x*100:.1f}%")
    results['Worst Day'] = results['Worst Day'].apply(lambda x: f"{x*100:.1f}%")
    results['Max Drawdown'] = results['Max Drawdown'].apply(lambda x: f"{x*100:.1f}%")
    print(results.to_string(index=False))

8.2.2 Hypothetical Stress Testing

def hypothetical_stress_test(weights: dict, scenarios: dict) -> pd.DataFrame:
    """
    Apply hypothetical scenarios to a portfolio.
    
    Args:
        weights: Asset -> weight
        scenarios: Scenario name -> {asset: return}
    
    Returns:
        DataFrame with scenario impacts
    """
    results = []
    
    for scenario_name, asset_returns in scenarios.items():
        port_return = sum(weights.get(a, 0) * asset_returns.get(a, 0) 
                         for a in weights.keys())
        results.append({
            'Scenario': scenario_name,
            'Portfolio Impact': port_return
        })
    
    return pd.DataFrame(results)

# Define hypothetical scenarios
hypothetical_scenarios = {
    'Market Crash (-20%)': {'SPY': -0.20, 'QQQ': -0.25, 'TLT': 0.05, 'GLD': 0.03},
    'Tech Bubble Burst': {'SPY': -0.15, 'QQQ': -0.35, 'TLT': 0.08, 'GLD': 0.05},
    'Stagflation': {'SPY': -0.12, 'QQQ': -0.18, 'TLT': -0.15, 'GLD': 0.25},
    'Flash Crash (-10% day)': {'SPY': -0.10, 'QQQ': -0.12, 'TLT': 0.02, 'GLD': 0.01},
    'Bond Market Crisis': {'SPY': -0.05, 'QQQ': -0.05, 'TLT': -0.25, 'GLD': 0.10},
    'Dollar Collapse': {'SPY': -0.08, 'QQQ': -0.08, 'TLT': -0.10, 'GLD': 0.35},
}

print("Hypothetical Stress Test Results")
print("=" * 60)

for port_name, port_weights in portfolios.items():
    print(f"\n{port_name} Portfolio:")
    results = hypothetical_stress_test(port_weights, hypothetical_scenarios)
    results = results.sort_values('Portfolio Impact')
    results['Portfolio Impact'] = results['Portfolio Impact'].apply(lambda x: f"{x*100:+.1f}%")
    print(results.to_string(index=False))

Exercise 8.2: Custom Stress Scenario (Guided)

Your Task: Design and test a geopolitical crisis scenario that affects all assets.

Fill in the blanks to complete the implementation:

Exercise
Click to reveal solution
def create_geopolitical_scenarios() -> dict:
    scenarios = {
        'Mild Tension': {
            'SPY': -0.05,
            'QQQ': -0.07,
            'TLT': 0.03,
            'GLD': 0.08
        },
        'Major Crisis': {
            'SPY': -0.15,
            'QQQ': -0.20,
            'TLT': 0.10,
            'GLD': 0.20
        },
        'Severe Conflict': {
            'SPY': -0.25,
            'QQQ': -0.30,
            'TLT': 0.08,
            'GLD': 0.35
        }
    }
    return scenarios

# Test
geo_scenarios = create_geopolitical_scenarios()
print("Geopolitical Crisis Stress Test")
print("=" * 50)

for port_name, port_weights in portfolios.items():
    results = hypothetical_stress_test(port_weights, geo_scenarios)
    worst = results['Portfolio Impact'].min()
    best = results['Portfolio Impact'].max()
    print(f"{port_name}: Best {best*100:+.1f}%, Worst {worst*100:.1f}%")

Section 8.3: Drawdown Analysis

VaR measures single-period risk. But investors often care more about cumulative losses over time—how far can the portfolio fall, and how long until recovery?

In this section, you will learn: - Maximum drawdown calculation - Drawdown duration analysis - Conditional Drawdown at Risk (CDaR)

def calculate_drawdowns(returns: pd.Series) -> dict:
    """
    Calculate drawdown series and statistics.
    
    Args:
        returns: Daily returns
    
    Returns:
        Dictionary with drawdown metrics and series
    """
    cum_returns = (1 + returns).cumprod()
    running_max = cum_returns.cummax()
    drawdown = (cum_returns - running_max) / running_max
    
    max_dd = drawdown.min()
    max_dd_end = drawdown.idxmin()
    peak_idx = cum_returns.loc[:max_dd_end].idxmax()
    
    # Find recovery
    peak_value = cum_returns.loc[peak_idx]
    after_trough = cum_returns.loc[max_dd_end:]
    recovery = after_trough[after_trough >= peak_value]
    recovery_date = recovery.index[0] if len(recovery) > 0 else None
    
    if recovery_date:
        duration = (recovery_date - peak_idx).days
    else:
        duration = (returns.index[-1] - peak_idx).days
    
    return {
        'max_drawdown': max_dd,
        'peak_date': peak_idx,
        'trough_date': max_dd_end,
        'recovery_date': recovery_date,
        'duration_days': duration,
        'drawdown_series': drawdown,
        'wealth_curve': cum_returns
    }

# Calculate for SPY
spy_dd = calculate_drawdowns(returns['SPY'])

print("SPY Maximum Drawdown Analysis")
print("=" * 50)
print(f"Maximum Drawdown: {spy_dd['max_drawdown']*100:.1f}%")
print(f"Peak Date: {spy_dd['peak_date'].strftime('%Y-%m-%d')}")
print(f"Trough Date: {spy_dd['trough_date'].strftime('%Y-%m-%d')}")
if spy_dd['recovery_date']:
    print(f"Recovery Date: {spy_dd['recovery_date'].strftime('%Y-%m-%d')}")
else:
    print("Recovery Date: Not yet recovered")
print(f"Duration: {spy_dd['duration_days']} days ({spy_dd['duration_days']/365:.1f} years)")
# Visualize drawdowns
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Top: Wealth curve
ax1 = axes[0]
ax1.plot(spy_dd['wealth_curve'], label='SPY Cumulative Return', linewidth=1.5)
ax1.fill_between(spy_dd['wealth_curve'].index, 
                 spy_dd['wealth_curve'].cummax(),
                 spy_dd['wealth_curve'],
                 alpha=0.3, color='red', label='Drawdown Area')
ax1.set_ylabel('Cumulative Return (Growth of $1)')
ax1.set_title('SPY Wealth Curve with Drawdown Periods')
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)

# Bottom: Drawdown series
ax2 = axes[1]
ax2.fill_between(spy_dd['drawdown_series'].index, 
                 spy_dd['drawdown_series'] * 100, 
                 0, alpha=0.7, color='crimson')
ax2.axhline(spy_dd['max_drawdown'] * 100, color='darkred', linestyle='--', 
           label=f'Max DD: {spy_dd["max_drawdown"]*100:.1f}%')
ax2.set_ylabel('Drawdown (%)')
ax2.set_xlabel('Date')
ax2.set_title('SPY Drawdown History')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
def calculate_cdar(returns: pd.Series, confidence: float = 0.95) -> tuple:
    """
    Calculate Conditional Drawdown at Risk.
    
    Args:
        returns: Daily returns
        confidence: Confidence level
    
    Returns:
        Tuple of (DaR, CDaR)
    """
    cum_returns = (1 + returns).cumprod()
    running_max = cum_returns.cummax()
    drawdowns = (cum_returns - running_max) / running_max
    
    alpha = 1 - confidence
    dar = -np.percentile(drawdowns, alpha * 100)
    
    threshold = np.percentile(drawdowns, alpha * 100)
    worst_drawdowns = drawdowns[drawdowns <= threshold]
    cdar = -np.mean(worst_drawdowns)
    
    return dar, cdar

# Calculate for all assets
print("Drawdown Risk Metrics Comparison")
print("=" * 60)

dd_metrics = []
for asset in returns.columns:
    dd_info = calculate_drawdowns(returns[asset])
    dar_95, cdar_95 = calculate_cdar(returns[asset], 0.95)
    
    dd_metrics.append({
        'Asset': asset,
        'Max DD': f"{dd_info['max_drawdown']*100:.1f}%",
        'DaR (95%)': f"{dar_95*100:.1f}%",
        'CDaR (95%)': f"{cdar_95*100:.1f}%"
    })

df_dd = pd.DataFrame(dd_metrics)
print(df_dd.to_string(index=False))

Exercise 8.3: Portfolio Drawdown Comparison (Open-ended)

Your Task:

Build a function that compares drawdown characteristics across multiple portfolios: - Calculate max drawdown, DaR, and CDaR for each portfolio - Create a visualization showing drawdown series for all portfolios - Rank portfolios by drawdown profile

Your implementation:

Exercise
Click to reveal solution
def compare_portfolio_drawdowns(portfolios: dict, 
                                returns_df: pd.DataFrame) -> pd.DataFrame:
    """
    Compare drawdown metrics across portfolios.

    Args:
        portfolios: Dict of portfolio name -> weights dict
        returns_df: DataFrame of asset returns

    Returns:
        DataFrame with drawdown metrics for each portfolio
    """
    results = []
    dd_series_dict = {}

    for port_name, weights in portfolios.items():
        # Calculate portfolio returns
        assets = [a for a in weights.keys() if a in returns_df.columns]
        w = np.array([weights[a] for a in assets])
        w = w / w.sum()
        port_ret = pd.Series(returns_df[assets].values @ w, index=returns_df.index)

        # Calculate metrics
        dd_info = calculate_drawdowns(port_ret)
        dar, cdar = calculate_cdar(port_ret, 0.95)

        results.append({
            'Portfolio': port_name,
            'Max DD': dd_info['max_drawdown'] * 100,
            'DaR 95%': dar * 100,
            'CDaR 95%': cdar * 100,
            'Duration': dd_info['duration_days']
        })

        dd_series_dict[port_name] = dd_info['drawdown_series']

    # Plot comparison
    fig, ax = plt.subplots(figsize=(14, 6))
    colors = {'Aggressive': 'crimson', 'Balanced': 'steelblue', 'Conservative': 'green'}

    for port_name, dd_series in dd_series_dict.items():
        ax.plot(dd_series * 100, label=port_name, linewidth=1.5, 
               color=colors.get(port_name, 'gray'))

    ax.axhline(0, color='black', linestyle='-', linewidth=0.5)
    ax.set_ylabel('Drawdown (%)')
    ax.set_xlabel('Date')
    ax.set_title('Portfolio Drawdown Comparison')
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

    return pd.DataFrame(results).sort_values('Max DD')

# Test
dd_comparison = compare_portfolio_drawdowns(portfolios, returns)
print("Portfolio Drawdown Comparison")
print("=" * 60)
print(dd_comparison.to_string(index=False))

Section 8.4: Tail Risk Measures

Beyond VaR and ES, several specialized metrics help quantify tail risk and return asymmetry.

In this section, you will learn: - Tail ratio and gain/loss metrics - Sortino ratio (downside deviation only) - Omega ratio (full distribution)

def tail_ratio(returns: pd.Series, percentile: int = 5) -> float:
    """
    Calculate the tail ratio.
    
    Tail Ratio = 95th percentile / |5th percentile|
    > 1.0 means positive skew (gains > losses in tails)
    """
    returns_arr = np.array(returns)
    right_tail = np.percentile(returns_arr, 100 - percentile)
    left_tail = np.percentile(returns_arr, percentile)
    return right_tail / abs(left_tail) if left_tail != 0 else np.inf

def sortino_ratio(returns: pd.Series, risk_free_rate: float = 0) -> float:
    """
    Calculate Sortino ratio (only penalizes downside volatility).
    """
    returns_arr = np.array(returns)
    excess = returns_arr - risk_free_rate / 252
    downside = returns_arr[returns_arr < 0]
    downside_std = np.std(downside) * np.sqrt(252) if len(downside) > 0 else 1e-10
    return (np.mean(returns_arr) * 252) / downside_std

def omega_ratio(returns: pd.Series, threshold: float = 0) -> float:
    """
    Calculate Omega ratio.
    
    Omega = Probability-weighted gains above threshold / 
            Probability-weighted losses below threshold
    """
    returns_arr = np.array(returns)
    gains = returns_arr[returns_arr > threshold] - threshold
    losses = threshold - returns_arr[returns_arr <= threshold]
    
    sum_gains = np.sum(gains) if len(gains) > 0 else 0
    sum_losses = np.sum(losses) if len(losses) > 0 else 1e-10
    
    return sum_gains / sum_losses

# Calculate for all assets
print("Advanced Risk-Adjusted Metrics")
print("=" * 60)

advanced_metrics = []
for asset in returns.columns:
    ret = returns[asset]
    
    sharpe = (ret.mean() * 252) / (ret.std() * np.sqrt(252))
    sortino = sortino_ratio(ret)
    omega = omega_ratio(ret)
    tr = tail_ratio(ret)
    
    dd = calculate_drawdowns(ret)
    calmar = (ret.mean() * 252) / abs(dd['max_drawdown'])
    
    advanced_metrics.append({
        'Asset': asset,
        'Sharpe': sharpe,
        'Sortino': sortino,
        'Omega': omega,
        'Calmar': calmar,
        'Tail Ratio': tr
    })

df_advanced = pd.DataFrame(advanced_metrics)
print(df_advanced.to_string(index=False, float_format=lambda x: f"{x:.2f}"))

Exercise 8.4: Gain/Loss Analysis (Guided)

Your Task: Calculate gain/loss metrics including win rate, average gain/loss, and profit factor.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def gain_loss_analysis(returns: pd.Series) -> dict:
    returns_arr = np.array(returns)

    gains = returns_arr[returns_arr > 0]
    losses = returns_arr[returns_arr < 0]

    win_rate = len(gains) / len(returns_arr)

    avg_gain = np.mean(gains) if len(gains) > 0 else 0
    avg_loss = np.mean(losses) if len(losses) > 0 else 0

    gl_ratio = avg_gain / abs(avg_loss) if avg_loss != 0 else np.inf

    profit_factor = np.sum(gains) / abs(np.sum(losses)) if np.sum(losses) != 0 else np.inf

    return {
        'win_rate': win_rate,
        'avg_gain': avg_gain,
        'avg_loss': avg_loss,
        'gain_loss_ratio': gl_ratio,
        'profit_factor': profit_factor
    }

# Test for all assets
print("Gain/Loss Analysis")
print("=" * 60)
for asset in returns.columns:
    m = gain_loss_analysis(returns[asset])
    print(f"{asset}: Win={m['win_rate']*100:.1f}%, G/L={m['gain_loss_ratio']:.2f}, PF={m['profit_factor']:.2f}")

Exercise 8.5: Comprehensive Risk Report (Open-ended)

Your Task:

Build a function that generates a comprehensive risk report including: - Basic statistics (return, volatility, skewness, kurtosis) - VaR and ES at multiple confidence levels - Drawdown metrics - All risk-adjusted ratios (Sharpe, Sortino, Calmar, Omega)

Your implementation:

Exercise
Click to reveal solution
def comprehensive_risk_report(returns: pd.Series, name: str = "Portfolio") -> dict:
    """
    Generate comprehensive risk report.

    Args:
        returns: Return series
        name: Name for display

    Returns:
        Dictionary with all risk metrics
    """
    ret = np.array(returns)

    # Basic stats
    ann_return = np.mean(ret) * 252
    ann_vol = np.std(ret) * np.sqrt(252)
    skewness = stats.skew(ret)
    kurtosis = stats.kurtosis(ret)

    # VaR and ES
    var_95, es_95 = calculate_var_es(ret, 0.95)
    var_99, es_99 = calculate_var_es(ret, 0.99)

    # Drawdown
    dd_info = calculate_drawdowns(returns)
    dar_95, cdar_95 = calculate_cdar(returns, 0.95)

    # Ratios
    sharpe = ann_return / ann_vol if ann_vol > 0 else 0
    sortino = sortino_ratio(ret)
    omega = omega_ratio(ret)
    calmar = ann_return / abs(dd_info['max_drawdown']) if dd_info['max_drawdown'] != 0 else 0
    tr = tail_ratio(ret)

    # Gain/Loss
    gl = gain_loss_analysis(ret)

    print(f"\n{'='*60}")
    print(f"RISK REPORT: {name}")
    print(f"{'='*60}")

    print(f"\nRETURN STATISTICS")
    print(f"  Annual Return:     {ann_return*100:>8.2f}%")
    print(f"  Annual Volatility: {ann_vol*100:>8.2f}%")
    print(f"  Skewness:          {skewness:>8.2f}")
    print(f"  Excess Kurtosis:   {kurtosis:>8.2f}")

    print(f"\nVALUE AT RISK")
    print(f"  VaR (95%):         {var_95*100:>8.2f}%")
    print(f"  ES (95%):          {es_95*100:>8.2f}%")
    print(f"  VaR (99%):         {var_99*100:>8.2f}%")
    print(f"  ES (99%):          {es_99*100:>8.2f}%")

    print(f"\nDRAWDOWN METRICS")
    print(f"  Max Drawdown:      {dd_info['max_drawdown']*100:>8.2f}%")
    print(f"  DaR (95%):         {dar_95*100:>8.2f}%")
    print(f"  CDaR (95%):        {cdar_95*100:>8.2f}%")

    print(f"\nRISK-ADJUSTED RATIOS")
    print(f"  Sharpe:            {sharpe:>8.2f}")
    print(f"  Sortino:           {sortino:>8.2f}")
    print(f"  Omega:             {omega:>8.2f}")
    print(f"  Calmar:            {calmar:>8.2f}")

    print(f"\nTAIL RISK")
    print(f"  Tail Ratio:        {tr:>8.2f}")
    print(f"  Win Rate:          {gl['win_rate']*100:>8.1f}%")
    print(f"  Profit Factor:     {gl['profit_factor']:>8.2f}")

    return {'ann_return': ann_return, 'ann_vol': ann_vol, 'sharpe': sharpe}

# Generate reports
for asset in returns.columns:
    comprehensive_risk_report(returns[asset], asset)

Exercise 8.6: Risk Dashboard Class (Open-ended)

Your Task:

Build a comprehensive RiskDashboard class that: - Calculates all risk metrics (VaR, ES, drawdowns, ratios) - Supports stress testing - Generates visualizations - Produces a summary report

Your implementation:

Exercise
Click to reveal solution
class RiskDashboard:
    """
    Comprehensive risk analysis dashboard.
    """

    def __init__(self, returns: pd.Series, name: str = "Portfolio"):
        self.returns = returns
        self.name = name
        self._calculate_metrics()

    def _calculate_metrics(self):
        ret = np.array(self.returns)

        # Basic stats
        self.ann_return = np.mean(ret) * 252
        self.ann_vol = np.std(ret) * np.sqrt(252)
        self.skewness = stats.skew(ret)
        self.kurtosis = stats.kurtosis(ret)

        # VaR/ES
        self.var_95, self.es_95 = calculate_var_es(ret, 0.95)
        self.var_99, self.es_99 = calculate_var_es(ret, 0.99)

        # Drawdown
        dd = calculate_drawdowns(self.returns)
        self.max_drawdown = dd['max_drawdown']
        self.drawdown_series = dd['drawdown_series']
        self.dar_95, self.cdar_95 = calculate_cdar(self.returns, 0.95)

        # Ratios
        self.sharpe = self.ann_return / self.ann_vol if self.ann_vol > 0 else 0
        self.sortino = sortino_ratio(ret)
        self.omega = omega_ratio(ret)
        self.calmar = self.ann_return / abs(self.max_drawdown) if self.max_drawdown != 0 else 0
        self.tail_ratio = tail_ratio(ret)

    def stress_test(self, scenarios: dict) -> pd.DataFrame:
        """Apply stress scenarios."""
        results = []
        for name, impact in scenarios.items():
            results.append({'Scenario': name, 'Impact': impact})
        return pd.DataFrame(results)

    def plot_dashboard(self, figsize=(14, 10)):
        """Create visual dashboard."""
        fig, axes = plt.subplots(2, 2, figsize=figsize)
        fig.suptitle(f'Risk Dashboard: {self.name}', fontsize=14, fontweight='bold')

        # Distribution with VaR/ES
        ax1 = axes[0, 0]
        ax1.hist(self.returns, bins=50, density=True, alpha=0.7, color='steelblue')
        ax1.axvline(-self.var_95, color='orange', linestyle='--', linewidth=2)
        ax1.axvline(-self.es_95, color='red', linestyle='-', linewidth=2)
        ax1.set_xlabel('Daily Return')
        ax1.set_title('Return Distribution with Risk Measures')

        # Drawdown
        ax2 = axes[0, 1]
        ax2.fill_between(self.drawdown_series.index, self.drawdown_series * 100, 0, 
                        alpha=0.7, color='crimson')
        ax2.set_xlabel('Date')
        ax2.set_ylabel('Drawdown (%)')
        ax2.set_title('Historical Drawdowns')

        # Rolling volatility
        ax3 = axes[1, 0]
        rolling_vol = self.returns.rolling(21).std() * np.sqrt(252) * 100
        ax3.plot(rolling_vol.index, rolling_vol, linewidth=1)
        ax3.set_xlabel('Date')
        ax3.set_ylabel('Volatility (%)')
        ax3.set_title('Rolling 21-day Volatility')

        # Metrics summary
        ax4 = axes[1, 1]
        ax4.axis('off')
        summary = f"""
        RISK METRICS SUMMARY
        {'='*30}

        Return & Volatility
        Annual Return:      {self.ann_return*100:>8.2f}%
        Annual Volatility:  {self.ann_vol*100:>8.2f}%

        Downside Risk
        VaR (95%):          {self.var_95*100:>8.2f}%
        ES (95%):           {self.es_95*100:>8.2f}%
        Max Drawdown:       {self.max_drawdown*100:>8.2f}%

        Risk-Adjusted Ratios
        Sharpe:             {self.sharpe:>8.2f}
        Sortino:            {self.sortino:>8.2f}
        Calmar:             {self.calmar:>8.2f}
        """
        ax4.text(0.1, 0.95, summary, transform=ax4.transAxes, fontsize=10,
                verticalalignment='top', fontfamily='monospace')

        plt.tight_layout()
        plt.subplots_adjust(top=0.92)
        plt.show()

# Test
dashboard = RiskDashboard(portfolio_returns, "Balanced Portfolio")
dashboard.plot_dashboard()

Module Project: Production Risk Management System

Build a comprehensive risk management system that integrates all concepts from this module.

# YOUR CODE HERE - Module Project
Click to reveal solution
class ProductionRiskSystem:
    """
    Production-ready risk management system.

    Features:
    - VaR and Expected Shortfall calculation
    - Stress testing (historical and hypothetical)
    - Drawdown analysis
    - Comprehensive risk-adjusted metrics
    - Portfolio comparison
    """

    def __init__(self, returns: pd.DataFrame, 
                 weights: np.ndarray = None,
                 portfolio_value: float = 1_000_000):
        self.returns = returns
        self.assets = list(returns.columns)
        self.weights = weights if weights is not None else \
                       np.ones(len(self.assets)) / len(self.assets)
        self.portfolio_value = portfolio_value
        self.portfolio_returns = pd.Series(
            returns.values @ self.weights, 
            index=returns.index
        )
        self._calculate_all_metrics()

    def _calculate_all_metrics(self):
        """Calculate all risk metrics."""
        ret = np.array(self.portfolio_returns)

        # Basic
        self.ann_return = np.mean(ret) * 252
        self.ann_vol = np.std(ret) * np.sqrt(252)

        # VaR/ES
        self.var_95, self.es_95 = calculate_var_es(ret, 0.95)
        self.var_99, self.es_99 = calculate_var_es(ret, 0.99)

        # Drawdown
        dd = calculate_drawdowns(self.portfolio_returns)
        self.max_drawdown = dd['max_drawdown']
        self.drawdown_series = dd['drawdown_series']
        self.dar_95, self.cdar_95 = calculate_cdar(self.portfolio_returns, 0.95)

        # Ratios
        self.sharpe = self.ann_return / self.ann_vol if self.ann_vol > 0 else 0
        self.sortino = sortino_ratio(ret)
        self.omega = omega_ratio(ret)
        self.calmar = self.ann_return / abs(self.max_drawdown) if self.max_drawdown != 0 else 0

    def var_report(self):
        """Generate VaR report."""
        print("\nVALUE AT RISK REPORT")
        print("=" * 50)
        print(f"Portfolio Value: ${self.portfolio_value:,.0f}")
        print(f"\n95% Confidence:")
        print(f"  VaR:  {self.var_95*100:.2f}% (${self.var_95*self.portfolio_value:,.0f})")
        print(f"  ES:   {self.es_95*100:.2f}% (${self.es_95*self.portfolio_value:,.0f})")
        print(f"\n99% Confidence:")
        print(f"  VaR:  {self.var_99*100:.2f}% (${self.var_99*self.portfolio_value:,.0f})")
        print(f"  ES:   {self.es_99*100:.2f}% (${self.es_99*self.portfolio_value:,.0f})")

    def stress_test(self, scenarios: dict):
        """Run stress tests."""
        print("\nSTRESS TEST RESULTS")
        print("=" * 50)
        for name, asset_impacts in scenarios.items():
            impact = sum(self.weights[i] * asset_impacts.get(a, 0) 
                        for i, a in enumerate(self.assets))
            dollar_impact = impact * self.portfolio_value
            print(f"{name}: {impact*100:+.1f}% (${dollar_impact:+,.0f})")

    def drawdown_report(self):
        """Generate drawdown report."""
        print("\nDRAWDOWN REPORT")
        print("=" * 50)
        print(f"Max Drawdown:  {self.max_drawdown*100:.1f}%")
        print(f"DaR (95%):     {self.dar_95*100:.1f}%")
        print(f"CDaR (95%):    {self.cdar_95*100:.1f}%")

    def performance_report(self):
        """Generate performance report."""
        print("\nPERFORMANCE REPORT")
        print("=" * 50)
        print(f"Annual Return:    {self.ann_return*100:.2f}%")
        print(f"Annual Volatility: {self.ann_vol*100:.2f}%")
        print(f"\nRisk-Adjusted Ratios:")
        print(f"  Sharpe:  {self.sharpe:.2f}")
        print(f"  Sortino: {self.sortino:.2f}")
        print(f"  Omega:   {self.omega:.2f}")
        print(f"  Calmar:  {self.calmar:.2f}")

    def full_report(self):
        """Generate comprehensive report."""
        print("\n" + "=" * 60)
        print("PRODUCTION RISK MANAGEMENT REPORT")
        print("=" * 60)
        print(f"\nPortfolio: {dict(zip(self.assets, self.weights))}")
        print(f"Portfolio Value: ${self.portfolio_value:,.0f}")

        self.performance_report()
        self.var_report()
        self.drawdown_report()

        # Stress test with default scenarios
        scenarios = {
            'Market Crash': {'SPY': -0.20, 'QQQ': -0.25, 'TLT': 0.05, 'GLD': 0.03},
            'Stagflation': {'SPY': -0.12, 'QQQ': -0.18, 'TLT': -0.15, 'GLD': 0.25}
        }
        self.stress_test(scenarios)

# Test
system = ProductionRiskSystem(
    returns=returns,
    weights=weights,
    portfolio_value=1_000_000
)
system.full_report()

Key Takeaways

What You Learned

1. Expected Shortfall (CVaR)

  • Measures average loss when VaR is breached
  • Preferred by regulators for coherence properties
  • Always greater than or equal to VaR

2. Stress Testing

  • Historical scenarios apply real crisis returns
  • Hypothetical scenarios test unprecedented events
  • No single portfolio is safe in all scenarios

3. Drawdown Analysis

  • Maximum drawdown captures peak-to-trough loss
  • Duration matters as much as magnitude
  • CDaR extends ES concept to drawdowns

4. Tail Risk Measures

  • Tail ratio compares extreme gains to losses
  • Sortino penalizes only downside volatility
  • Omega considers the entire distribution

Coming Up Next

In Module 9: Factor Models, we'll explore: - CAPM and beta estimation - Fama-French multi-factor models - Factor-based portfolio construction - Style analysis and attribution


Congratulations on completing Module 8!

Module 9: Factor Models

Course 3: Quantitative Finance & Portfolio Theory
Part 3: Risk Modeling


Learning Objectives

By the end of this module, you will be able to:

  1. Understand and implement CAPM for beta estimation
  2. Apply multi-factor models (Fama-French 3-factor and 5-factor)
  3. Calculate alpha and evaluate statistical significance
  4. Perform factor-based portfolio analysis and attribution
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 8: Beyond VaR

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy import stats
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', '{:.4f}'.format)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Libraries loaded successfully!')

Load Data

# Download stock and market data
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA', 'JNJ', 'JPM', 'XOM']
market_ticker = 'SPY'

end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)

print("Downloading market and stock data...")

all_tickers = tickers + [market_ticker]
data = yf.download(all_tickers, start=start_date, end=end_date, progress=False)

# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
    else:
        prices = data.iloc[:, :len(all_tickers)]
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

prices.columns = [str(col) for col in prices.columns]

# Calculate returns
returns = prices.pct_change().dropna()

# Separate market and stock returns
market_returns = returns[market_ticker]
stock_returns = returns[tickers]

# Risk-free rate
risk_free_rate = 0.04
daily_rf = risk_free_rate / 252

# Calculate excess returns
market_excess = market_returns - daily_rf
stock_excess = stock_returns.sub(daily_rf, axis=0)

print(f"\nData loaded: {len(prices)} trading days")
print(f"Stocks: {tickers}")
print(f"Market proxy: {market_ticker}")

Section 9.1: CAPM and Beta

The Capital Asset Pricing Model (CAPM) tells us that expected return is determined by a single factor—market risk (beta).

$$E[R_i] = R_f + \beta_i (E[R_m] - R_f)$$

In this section, you will learn: - Beta calculation methods - Rolling beta for time-varying risk - Interpreting beta components

9.1.1 Beta Calculation

Beta measures sensitivity to market movements:

$$\beta_i = \frac{Cov(R_i, R_m)}{Var(R_m)} = \rho_{i,m} \cdot \frac{\sigma_i}{\sigma_m}$$

def calculate_beta_cov(stock_returns: pd.Series, 
                       market_returns: pd.Series) -> float:
    """Calculate beta using covariance method."""
    cov = stock_returns.cov(market_returns)
    var = market_returns.var()
    return cov / var

def calculate_beta_regression(stock_returns: pd.Series, 
                              market_returns: pd.Series) -> tuple:
    """Calculate beta using OLS regression."""
    X = sm.add_constant(market_returns)
    model = sm.OLS(stock_returns, X).fit()
    return model.params.iloc[1], model

# Calculate betas for all stocks
print("Beta Calculation Comparison")
print("=" * 60)
print(f"{'Ticker':<8} {'Cov/Var':>10} {'Regression':>12} {'R-squared':>12}")
print("-" * 60)

betas = {}
models = {}
for ticker in tickers:
    beta_cov = calculate_beta_cov(stock_returns[ticker], market_returns)
    beta_reg, model = calculate_beta_regression(stock_excess[ticker], market_excess)
    betas[ticker] = beta_reg
    models[ticker] = model
    
    print(f"{ticker:<8} {beta_cov:>10.3f} {beta_reg:>12.3f} {model.rsquared:>12.3f}")
# Visualize the Security Market Line
plt.figure(figsize=(12, 8))

# Calculate annual market return and risk premium
annual_market_return = market_returns.mean() * 252
market_risk_premium = annual_market_return - risk_free_rate

# SML line
beta_range = np.linspace(0, 2.5, 100)
sml_returns = risk_free_rate + beta_range * market_risk_premium
plt.plot(beta_range, sml_returns, 'b-', linewidth=2, label='Security Market Line')

# Plot each stock
for ticker in tickers:
    beta = betas[ticker]
    actual_return = stock_returns[ticker].mean() * 252
    expected_return = risk_free_rate + beta * market_risk_premium
    
    color = 'green' if actual_return > expected_return else 'red'
    plt.scatter(beta, actual_return, s=150, c=color, edgecolors='black', 
               linewidth=1.5, zorder=5)
    plt.annotate(ticker, (beta, actual_return), xytext=(5, 5), 
                textcoords='offset points', fontsize=10)

plt.scatter(1, annual_market_return, s=200, marker='D', c='blue', 
           edgecolors='black', linewidth=2, zorder=5, label='Market (β=1)')

plt.xlabel('Beta (β)', fontsize=12)
plt.ylabel('Annual Return', fontsize=12)
plt.title('Security Market Line\nGreen = Positive Alpha, Red = Negative Alpha', 
         fontsize=14, fontweight='bold')
plt.legend(loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

9.1.2 Rolling Beta

Beta is not constant over time. Rolling beta shows how market sensitivity evolves.

def calculate_rolling_beta(stock_returns: pd.Series, 
                           market_returns: pd.Series, 
                           window: int = 252) -> pd.Series:
    """Calculate rolling beta over a specified window."""
    rolling_cov = stock_returns.rolling(window=window).cov(market_returns)
    rolling_var = market_returns.rolling(window=window).var()
    return rolling_cov / rolling_var

# Calculate rolling betas
window = 252
rolling_betas = pd.DataFrame()

for ticker in tickers:
    rolling_betas[ticker] = calculate_rolling_beta(
        stock_returns[ticker], market_returns, window
    )

rolling_betas = rolling_betas.dropna()

# Plot rolling betas
fig, ax = plt.subplots(figsize=(14, 8))

for ticker in tickers:
    ax.plot(rolling_betas.index, rolling_betas[ticker], label=ticker, alpha=0.7)

ax.axhline(y=1, color='black', linestyle='--', linewidth=2, alpha=0.5, label='Market (β=1)')
ax.set_xlabel('Date')
ax.set_ylabel('Rolling Beta (252-day)')
ax.set_title('Rolling Beta Over Time', fontsize=14, fontweight='bold')
ax.legend(loc='upper right', ncol=3)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Exercise 9.1: Beta Component Analysis (Guided)

Your Task: Break down beta into its components: correlation and relative volatility.

$$\beta = \rho_{i,m} \times \frac{\sigma_i}{\sigma_m}$$

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def beta_components(stock_returns: pd.Series, 
                    market_returns: pd.Series) -> dict:
    correlation = stock_returns.corr(market_returns)
    stock_vol = stock_returns.std()
    market_vol = market_returns.std()
    vol_ratio = stock_vol / market_vol
    beta = correlation * vol_ratio

    return {
        'beta': beta,
        'correlation': correlation,
        'vol_ratio': vol_ratio,
        'stock_vol': stock_vol * np.sqrt(252),
        'market_vol': market_vol * np.sqrt(252)
    }

# Test for all stocks
print("Beta Component Analysis")
print("=" * 70)
print(f"{'Stock':<8} {'Beta':>8} {'Corr':>8} {'Vol Ratio':>10} {'Stock Vol':>10}")
print("-" * 70)

for ticker in tickers:
    result = beta_components(stock_returns[ticker], market_returns)
    print(f"{ticker:<8} {result['beta']:>8.3f} {result['correlation']:>8.3f} "
          f"{result['vol_ratio']:>10.2f} {result['stock_vol']*100:>9.1f}%")

Section 9.2: Multi-Factor Models

CAPM uses only one factor (market). The Fama-French models add additional factors that help explain cross-sectional returns.

In this section, you will learn: - Fama-French 3-factor model (Market, SMB, HML) - Fama-French 5-factor model (adds RMW, CMA) - Factor loading interpretation

9.2.1 Fama-French Factor Construction

We'll simulate Fama-French factors for demonstration (in practice, use data from Kenneth French's library).

# Simulate Fama-French factors based on market returns
np.random.seed(42)
n_days = len(market_returns)

# SMB (Small Minus Big) - size factor
smb = pd.Series(
    np.random.normal(0.0001, 0.006, n_days) + market_returns.values * 0.2,
    index=market_returns.index,
    name='SMB'
)

# HML (High Minus Low) - value factor
hml = pd.Series(
    np.random.normal(0.0001, 0.005, n_days) - market_returns.values * 0.1,
    index=market_returns.index,
    name='HML'
)

# RMW (Robust Minus Weak) - profitability factor
rmw = pd.Series(
    np.random.normal(0.0001, 0.004, n_days),
    index=market_returns.index,
    name='RMW'
)

# CMA (Conservative Minus Aggressive) - investment factor
cma = pd.Series(
    np.random.normal(0.0001, 0.004, n_days),
    index=market_returns.index,
    name='CMA'
)

# Create factor DataFrame
factors_3 = pd.DataFrame({
    'MKT': market_excess,
    'SMB': smb,
    'HML': hml
})

factors_5 = pd.DataFrame({
    'MKT': market_excess,
    'SMB': smb,
    'HML': hml,
    'RMW': rmw,
    'CMA': cma
})

print("Fama-French Factors (Simulated)")
print("=" * 50)
print("\nFactor Statistics (Annualized):")
for col in factors_5.columns:
    mean = factors_5[col].mean() * 252 * 100
    std = factors_5[col].std() * np.sqrt(252) * 100
    print(f"  {col}: Mean={mean:.2f}%, Vol={std:.2f}%")

9.2.2 Three-Factor Model

$$R_i - R_f = \alpha + \beta_{MKT}(R_m - R_f) + \beta_{SMB} \cdot SMB + \beta_{HML} \cdot HML + \epsilon$$

def fit_factor_model(stock_excess: pd.Series, 
                     factors: pd.DataFrame) -> dict:
    """
    Fit a multi-factor model.
    
    Args:
        stock_excess: Excess returns of the stock
        factors: DataFrame of factor returns
    
    Returns:
        Dictionary with model results
    """
    X = sm.add_constant(factors)
    model = sm.OLS(stock_excess, X).fit()
    
    return {
        'alpha': model.params.iloc[0],
        'alpha_annual': model.params.iloc[0] * 252,
        'alpha_tstat': model.tvalues.iloc[0],
        'alpha_pvalue': model.pvalues.iloc[0],
        'betas': model.params.iloc[1:].to_dict(),
        'r_squared': model.rsquared,
        'model': model
    }

# Fit 3-factor model for all stocks
print("Fama-French 3-Factor Model Results")
print("=" * 80)
print(f"{'Stock':<8} {'Alpha(ann)':>12} {'MKT':>8} {'SMB':>8} {'HML':>8} {'R²':>8}")
print("-" * 80)

ff3_results = {}
for ticker in tickers:
    result = fit_factor_model(stock_excess[ticker], factors_3)
    ff3_results[ticker] = result
    
    print(f"{ticker:<8} {result['alpha_annual']*100:>11.2f}% "
          f"{result['betas']['MKT']:>8.3f} "
          f"{result['betas']['SMB']:>8.3f} "
          f"{result['betas']['HML']:>8.3f} "
          f"{result['r_squared']:>8.3f}")

9.2.3 Five-Factor Model

# Fit 5-factor model for all stocks
print("Fama-French 5-Factor Model Results")
print("=" * 100)
print(f"{'Stock':<8} {'Alpha':>10} {'MKT':>8} {'SMB':>8} {'HML':>8} {'RMW':>8} {'CMA':>8} {'R²':>8}")
print("-" * 100)

ff5_results = {}
for ticker in tickers:
    result = fit_factor_model(stock_excess[ticker], factors_5)
    ff5_results[ticker] = result
    
    print(f"{ticker:<8} {result['alpha_annual']*100:>9.2f}% "
          f"{result['betas']['MKT']:>8.3f} "
          f"{result['betas']['SMB']:>8.3f} "
          f"{result['betas']['HML']:>8.3f} "
          f"{result['betas']['RMW']:>8.3f} "
          f"{result['betas']['CMA']:>8.3f} "
          f"{result['r_squared']:>8.3f}")

Exercise 9.2: Factor Model Comparison (Guided)

Your Task: Compare CAPM vs 3-factor vs 5-factor models for a stock.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def compare_factor_models(stock_excess: pd.Series,
                          market_excess: pd.Series,
                          factors_3: pd.DataFrame,
                          factors_5: pd.DataFrame) -> pd.DataFrame:
    results = []

    # CAPM
    X_capm = sm.add_constant(market_excess)
    model_capm = sm.OLS(stock_excess, X_capm).fit()
    results.append({
        'Model': 'CAPM',
        'Alpha': model_capm.params.iloc[0] * 252 * 100,
        'R_squared': model_capm.rsquared,
        'AIC': model_capm.aic
    })

    # 3-Factor
    X_3f = sm.add_constant(factors_3)
    model_3f = sm.OLS(stock_excess, X_3f).fit()
    results.append({
        'Model': '3-Factor',
        'Alpha': model_3f.params.iloc[0] * 252 * 100,
        'R_squared': model_3f.rsquared,
        'AIC': model_3f.aic
    })

    # 5-Factor
    X_5f = sm.add_constant(factors_5)
    model_5f = sm.OLS(stock_excess, X_5f).fit()
    results.append({
        'Model': '5-Factor',
        'Alpha': model_5f.params.iloc[0] * 252 * 100,
        'R_squared': model_5f.rsquared,
        'AIC': model_5f.aic
    })

    return pd.DataFrame(results)

# Test
for ticker in ['AAPL', 'TSLA', 'JNJ']:
    print(f"\n{ticker} Model Comparison:")
    comp = compare_factor_models(stock_excess[ticker], market_excess, factors_3, factors_5)
    print(comp.to_string(index=False))

Section 9.3: Alpha Analysis

Alpha is the intercept in a factor model—the return not explained by factor exposures.

In this section, you will learn: - Jensen's alpha interpretation - Statistical significance testing - Information ratio

# Alpha analysis with significance
print("Alpha Significance Analysis")
print("=" * 70)
print(f"{'Stock':<8} {'Alpha (ann)':>12} {'t-stat':>10} {'p-value':>10} {'Significant':>12}")
print("-" * 70)

for ticker in tickers:
    result = ff3_results[ticker]
    sig = "Yes" if result['alpha_pvalue'] < 0.05 else "No"
    
    print(f"{ticker:<8} {result['alpha_annual']*100:>11.2f}% "
          f"{result['alpha_tstat']:>10.3f} "
          f"{result['alpha_pvalue']:>10.4f} "
          f"{sig:>12}")
# Calculate Information Ratio
def information_ratio(stock_excess: pd.Series, model_result: dict) -> float:
    """
    Calculate Information Ratio.
    
    IR = Alpha / Tracking Error
    """
    alpha_annual = model_result['alpha_annual']
    residuals = model_result['model'].resid
    tracking_error = residuals.std() * np.sqrt(252)
    return alpha_annual / tracking_error

print("Information Ratio Analysis")
print("=" * 55)
print(f"{'Stock':<8} {'Alpha':>10} {'Track Error':>12} {'Info Ratio':>12}")
print("-" * 55)

for ticker in tickers:
    result = ff3_results[ticker]
    ir = information_ratio(stock_excess[ticker], result)
    te = result['model'].resid.std() * np.sqrt(252)
    
    print(f"{ticker:<8} {result['alpha_annual']*100:>9.2f}% "
          f"{te*100:>11.2f}% "
          f"{ir:>12.3f}")

print("\nInformation Ratio Interpretation:")
print("  > 0.5: Good | > 1.0: Excellent | > 1.5: Exceptional")

Exercise 9.3: Rolling Alpha Analysis (Open-ended)

Your Task:

Build a function that calculates rolling alpha with significance: - Calculate alpha using a rolling window - Track t-statistics over time - Identify periods of significant alpha - Visualize the results

Your implementation:

Exercise
Click to reveal solution
def rolling_alpha_analysis(stock_excess: pd.Series,
                           factors: pd.DataFrame,
                           window: int = 252) -> pd.DataFrame:
    """
    Calculate rolling alpha with significance.

    Args:
        stock_excess: Stock excess returns
        factors: Factor returns
        window: Rolling window

    Returns:
        DataFrame with rolling alpha and t-stats
    """
    alphas = []
    tstats = []
    dates = []

    for i in range(window, len(stock_excess)):
        y = stock_excess.iloc[i-window:i]
        X = sm.add_constant(factors.iloc[i-window:i])

        try:
            model = sm.OLS(y, X).fit()
            alphas.append(model.params.iloc[0] * 252)  # Annualized
            tstats.append(model.tvalues.iloc[0])
            dates.append(stock_excess.index[i])
        except:
            alphas.append(np.nan)
            tstats.append(np.nan)
            dates.append(stock_excess.index[i])

    return pd.DataFrame({
        'alpha': alphas,
        't_stat': tstats
    }, index=dates)

# Calculate for AAPL
rolling_alpha_df = rolling_alpha_analysis(stock_excess['AAPL'], factors_3, 252)

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

ax1 = axes[0]
ax1.plot(rolling_alpha_df.index, rolling_alpha_df['alpha'] * 100, linewidth=1.5)
ax1.axhline(0, color='black', linestyle='--', alpha=0.5)
ax1.fill_between(rolling_alpha_df.index, 0, rolling_alpha_df['alpha'] * 100,
                where=rolling_alpha_df['alpha'] > 0, alpha=0.3, color='green')
ax1.fill_between(rolling_alpha_df.index, 0, rolling_alpha_df['alpha'] * 100,
                where=rolling_alpha_df['alpha'] <= 0, alpha=0.3, color='red')
ax1.set_ylabel('Alpha (%)')
ax1.set_title('AAPL Rolling Alpha (252-day)')

ax2 = axes[1]
ax2.plot(rolling_alpha_df.index, rolling_alpha_df['t_stat'], linewidth=1.5)
ax2.axhline(1.96, color='red', linestyle='--', label='t=1.96')
ax2.axhline(-1.96, color='red', linestyle='--')
ax2.axhline(0, color='black', linestyle='-', alpha=0.3)
ax2.set_ylabel('t-statistic')
ax2.set_xlabel('Date')
ax2.set_title('Alpha Significance')
ax2.legend()

plt.tight_layout()
plt.show()

# Count significant periods
sig_positive = (rolling_alpha_df['t_stat'] > 1.96).sum()
sig_negative = (rolling_alpha_df['t_stat'] < -1.96).sum()
total = len(rolling_alpha_df)
print(f"Significant positive alpha: {sig_positive/total*100:.1f}% of days")
print(f"Significant negative alpha: {sig_negative/total*100:.1f}% of days")

Section 9.4: Factor Attribution

Factor attribution decomposes portfolio returns into factor contributions.

In this section, you will learn: - Return decomposition by factors - Factor contribution analysis - Style analysis

def factor_attribution(stock_excess: pd.Series, 
                       factors: pd.DataFrame,
                       model_result: dict) -> dict:
    """
    Decompose returns by factor contributions.
    
    Args:
        stock_excess: Stock excess returns
        factors: Factor returns
        model_result: Fitted model result
    
    Returns:
        Dictionary with factor contributions
    """
    betas = model_result['betas']
    alpha = model_result['alpha']
    
    # Total return
    total_return = stock_excess.mean() * 252
    
    # Factor contributions
    contributions = {}
    for factor, beta in betas.items():
        factor_return = factors[factor].mean() * 252
        contributions[factor] = beta * factor_return
    
    # Alpha contribution
    contributions['Alpha'] = alpha * 252
    
    return {
        'total_return': total_return,
        'contributions': contributions
    }

# Attribution for all stocks
print("Factor Attribution Analysis (Annualized)")
print("=" * 80)

for ticker in ['AAPL', 'TSLA', 'JNJ']:
    attr = factor_attribution(stock_excess[ticker], factors_3, ff3_results[ticker])
    
    print(f"\n{ticker}:")
    print(f"  Total Excess Return: {attr['total_return']*100:.2f}%")
    print("  Contributions:")
    for factor, contrib in attr['contributions'].items():
        print(f"    {factor}: {contrib*100:+.2f}%")
# Visualize factor contributions
def plot_factor_attribution(ticker: str, attr: dict):
    """Create waterfall chart for factor attribution."""
    fig, ax = plt.subplots(figsize=(10, 6))
    
    contributions = attr['contributions']
    labels = list(contributions.keys())
    values = [v * 100 for v in contributions.values()]
    
    colors = ['green' if v > 0 else 'red' for v in values]
    
    bars = ax.bar(labels, values, color=colors, edgecolor='black', alpha=0.7)
    
    # Add total
    total = sum(values)
    ax.bar('Total', total, color='blue', edgecolor='black', alpha=0.7)
    
    ax.axhline(0, color='black', linewidth=0.5)
    ax.set_ylabel('Contribution (%)')
    ax.set_title(f'{ticker} Factor Attribution')
    
    # Add value labels
    for bar, val in zip(bars, values):
        ax.annotate(f'{val:+.2f}%', 
                   xy=(bar.get_x() + bar.get_width()/2, val),
                   ha='center', va='bottom' if val > 0 else 'top')
    
    plt.tight_layout()
    plt.show()

# Plot for a selected stock
attr = factor_attribution(stock_excess['AAPL'], factors_3, ff3_results['AAPL'])
plot_factor_attribution('AAPL', attr)

Exercise 9.4: Portfolio Factor Exposure (Guided)

Your Task: Calculate the aggregate factor exposures for a portfolio of stocks.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def portfolio_factor_exposure(weights: dict, 
                              factor_results: dict) -> dict:
    first_result = list(factor_results.values())[0]
    factors = list(first_result['betas'].keys())

    portfolio_betas = {f: 0.0 for f in factors}
    portfolio_alpha = 0.0

    for stock, weight in weights.items():
        if stock in factor_results:
            result = factor_results[stock]
            for factor in factors:
                portfolio_betas[factor] += weight * result['betas'][factor]
            portfolio_alpha += weight * result['alpha']

    return {
        'betas': portfolio_betas,
        'alpha': portfolio_alpha,
        'alpha_annual': portfolio_alpha * 252
    }

# Test with different portfolios
portfolios = {
    'Tech': {'AAPL': 0.33, 'MSFT': 0.34, 'GOOGL': 0.33},
    'Defensive': {'JNJ': 0.5, 'XOM': 0.5},
    'Growth': {'TSLA': 0.5, 'AMZN': 0.5}
}

print("Portfolio Factor Exposures")
print("=" * 60)
for port_name, weights in portfolios.items():
    exp = portfolio_factor_exposure(weights, ff3_results)
    print(f"\n{port_name}:")
    print(f"  MKT: {exp['betas']['MKT']:.3f}")
    print(f"  SMB: {exp['betas']['SMB']:.3f}")
    print(f"  HML: {exp['betas']['HML']:.3f}")
    print(f"  Alpha (ann): {exp['alpha_annual']*100:.2f}%")

Exercise 9.5: Style Analysis (Open-ended)

Your Task:

Build a function that performs style analysis: - Determine if a stock is value/growth based on HML loading - Determine if it's small/large cap based on SMB loading - Create a style quadrant visualization

Your implementation:

Exercise
Click to reveal solution
def style_analysis(factor_results: dict) -> pd.DataFrame:
    """
    Perform style analysis based on factor loadings.

    Args:
        factor_results: Stock -> factor model results

    Returns:
        DataFrame with style classifications
    """
    results = []

    for ticker, result in factor_results.items():
        smb = result['betas']['SMB']
        hml = result['betas']['HML']
        mkt = result['betas']['MKT']

        # Size classification
        size = 'Small' if smb > 0 else 'Large'

        # Value/Growth classification
        style = 'Value' if hml > 0 else 'Growth'

        # Combined style box
        quadrant = f'{size}-{style}'

        results.append({
            'Stock': ticker,
            'SMB': smb,
            'HML': hml,
            'MKT': mkt,
            'Size': size,
            'Style': style,
            'Quadrant': quadrant
        })

    return pd.DataFrame(results)

# Perform style analysis
styles = style_analysis(ff3_results)
print("Style Analysis")
print("=" * 70)
print(styles.to_string(index=False))

# Plot style quadrant
fig, ax = plt.subplots(figsize=(10, 8))

for _, row in styles.iterrows():
    color = {'Large-Value': 'blue', 'Large-Growth': 'green',
             'Small-Value': 'orange', 'Small-Growth': 'red'}[row['Quadrant']]
    ax.scatter(row['HML'], row['SMB'], s=200, c=color, 
              edgecolors='black', linewidth=1.5)
    ax.annotate(row['Stock'], (row['HML'], row['SMB']),
               xytext=(5, 5), textcoords='offset points')

ax.axhline(0, color='black', linestyle='-', linewidth=0.5)
ax.axvline(0, color='black', linestyle='-', linewidth=0.5)

# Add quadrant labels
ax.text(0.1, 0.1, 'Small\nValue', transform=ax.transAxes, fontsize=10, alpha=0.5)
ax.text(0.8, 0.1, 'Small\nGrowth', transform=ax.transAxes, fontsize=10, alpha=0.5)
ax.text(0.1, 0.85, 'Large\nValue', transform=ax.transAxes, fontsize=10, alpha=0.5)
ax.text(0.8, 0.85, 'Large\nGrowth', transform=ax.transAxes, fontsize=10, alpha=0.5)

ax.set_xlabel('HML Loading (Value ↔ Growth)')
ax.set_ylabel('SMB Loading (Small ↔ Large)')
ax.set_title('Style Box Analysis')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Exercise 9.6: Complete Factor Model System (Open-ended)

Your Task:

Build a comprehensive FactorModel class that: - Fits CAPM, 3-factor, and 5-factor models - Calculates alpha with significance - Performs factor attribution - Generates visualizations and reports

Your implementation:

Exercise
Click to reveal solution
class FactorModelAnalyzer:
    """
    Comprehensive factor model analysis system.

    Supports CAPM, 3-factor, and 5-factor models with
    alpha analysis, attribution, and style classification.
    """

    def __init__(self, stock_excess: pd.Series, 
                 market_excess: pd.Series,
                 factors: pd.DataFrame,
                 ticker: str = "Stock"):
        self.stock_excess = stock_excess
        self.market_excess = market_excess
        self.factors = factors
        self.ticker = ticker
        self.models = {}
        self._fit_all_models()

    def _fit_all_models(self):
        """Fit CAPM and multi-factor models."""
        # CAPM
        X_capm = sm.add_constant(self.market_excess)
        self.models['CAPM'] = sm.OLS(self.stock_excess, X_capm).fit()

        # Multi-factor
        X_mf = sm.add_constant(self.factors)
        self.models['Multi-Factor'] = sm.OLS(self.stock_excess, X_mf).fit()

    def get_alpha(self, model_name: str = 'Multi-Factor') -> dict:
        """Get alpha statistics."""
        model = self.models[model_name]
        return {
            'alpha_daily': model.params.iloc[0],
            'alpha_annual': model.params.iloc[0] * 252,
            't_stat': model.tvalues.iloc[0],
            'p_value': model.pvalues.iloc[0],
            'significant': model.pvalues.iloc[0] < 0.05
        }

    def get_betas(self, model_name: str = 'Multi-Factor') -> dict:
        """Get factor betas."""
        model = self.models[model_name]
        return model.params.iloc[1:].to_dict()

    def get_r_squared(self, model_name: str = 'Multi-Factor') -> float:
        """Get model R-squared."""
        return self.models[model_name].rsquared

    def factor_attribution(self) -> dict:
        """Decompose returns by factors."""
        betas = self.get_betas()
        alpha = self.get_alpha()['alpha_daily']

        contributions = {}
        for factor, beta in betas.items():
            contributions[factor] = beta * self.factors[factor].mean() * 252
        contributions['Alpha'] = alpha * 252

        return contributions

    def information_ratio(self) -> float:
        """Calculate Information Ratio."""
        alpha = self.get_alpha()['alpha_annual']
        te = self.models['Multi-Factor'].resid.std() * np.sqrt(252)
        return alpha / te

    def summary(self):
        """Print comprehensive summary."""
        print(f"\n{'='*60}")
        print(f"FACTOR MODEL ANALYSIS: {self.ticker}")
        print(f"{'='*60}")

        # Model comparison
        print(f"\nMODEL COMPARISON")
        print(f"{'-'*40}")
        for name, model in self.models.items():
            alpha = model.params.iloc[0] * 252 * 100
            print(f"{name}: R²={model.rsquared:.3f}, Alpha={alpha:+.2f}%")

        # Alpha analysis
        alpha = self.get_alpha()
        print(f"\nALPHA ANALYSIS")
        print(f"{'-'*40}")
        print(f"Alpha (annual): {alpha['alpha_annual']*100:.2f}%")
        print(f"t-statistic: {alpha['t_stat']:.3f}")
        print(f"p-value: {alpha['p_value']:.4f}")
        print(f"Significant: {'Yes' if alpha['significant'] else 'No'}")
        print(f"Information Ratio: {self.information_ratio():.3f}")

        # Factor betas
        print(f"\nFACTOR LOADINGS")
        print(f"{'-'*40}")
        for factor, beta in self.get_betas().items():
            print(f"{factor}: {beta:.3f}")

        # Attribution
        print(f"\nFACTOR ATTRIBUTION (Annual)")
        print(f"{'-'*40}")
        for factor, contrib in self.factor_attribution().items():
            print(f"{factor}: {contrib*100:+.2f}%")

# Test
analyzer = FactorModelAnalyzer(
    stock_excess['AAPL'],
    market_excess,
    factors_3,
    'AAPL'
)
analyzer.summary()

Module Project: Production Factor Analysis System

Build a comprehensive factor analysis system suitable for institutional use.

# YOUR CODE HERE - Module Project
Click to reveal solution
class ProductionFactorSystem:
    """
    Production-ready factor analysis system.

    Features:
    - Multi-stock factor model analysis
    - Rolling analysis for time-varying exposures
    - Portfolio-level factor attribution
    - Style classification
    - Comprehensive reporting
    """

    def __init__(self, returns: pd.DataFrame,
                 market_returns: pd.Series,
                 factors: pd.DataFrame,
                 risk_free_rate: float = 0.04):
        self.returns = returns
        self.market_returns = market_returns
        self.factors = factors
        self.rf = risk_free_rate
        self.daily_rf = risk_free_rate / 252

        # Calculate excess returns
        self.excess_returns = returns.sub(self.daily_rf, axis=0)
        self.market_excess = market_returns - self.daily_rf

        # Fit models for all stocks
        self.results = {}
        self._fit_all_models()

    def _fit_all_models(self):
        """Fit factor models for all stocks."""
        for ticker in self.returns.columns:
            X = sm.add_constant(self.factors)
            model = sm.OLS(self.excess_returns[ticker], X).fit()

            self.results[ticker] = {
                'model': model,
                'alpha': model.params.iloc[0],
                'alpha_annual': model.params.iloc[0] * 252,
                'alpha_tstat': model.tvalues.iloc[0],
                'alpha_pvalue': model.pvalues.iloc[0],
                'betas': model.params.iloc[1:].to_dict(),
                'r_squared': model.rsquared
            }

    def get_stock_summary(self, ticker: str) -> dict:
        """Get summary for a single stock."""
        return self.results[ticker]

    def portfolio_exposure(self, weights: dict) -> dict:
        """Calculate portfolio-level factor exposures."""
        factors = list(self.factors.columns)
        port_betas = {f: 0.0 for f in factors}
        port_alpha = 0.0

        for stock, weight in weights.items():
            if stock in self.results:
                for factor in factors:
                    port_betas[factor] += weight * self.results[stock]['betas'][factor]
                port_alpha += weight * self.results[stock]['alpha']

        return {
            'betas': port_betas,
            'alpha': port_alpha,
            'alpha_annual': port_alpha * 252
        }

    def style_classification(self) -> pd.DataFrame:
        """Classify stocks by style."""
        classifications = []
        for ticker, result in self.results.items():
            smb = result['betas'].get('SMB', 0)
            hml = result['betas'].get('HML', 0)

            size = 'Small' if smb > 0 else 'Large'
            style = 'Value' if hml > 0 else 'Growth'

            classifications.append({
                'Stock': ticker,
                'Size': size,
                'Style': style,
                'Quadrant': f'{size}-{style}'
            })

        return pd.DataFrame(classifications)

    def report(self, portfolio_weights: dict = None):
        """Generate comprehensive report."""
        print("\n" + "=" * 70)
        print("PRODUCTION FACTOR ANALYSIS REPORT")
        print("=" * 70)

        # Individual stocks
        print("\nINDIVIDUAL STOCK ANALYSIS")
        print("-" * 70)
        print(f"{'Stock':<8} {'Alpha':>10} {'MKT':>8} {'SMB':>8} {'HML':>8} {'R²':>8}")
        print("-" * 70)

        for ticker, result in self.results.items():
            print(f"{ticker:<8} {result['alpha_annual']*100:>9.2f}% "
                  f"{result['betas'].get('MKT', 0):>8.3f} "
                  f"{result['betas'].get('SMB', 0):>8.3f} "
                  f"{result['betas'].get('HML', 0):>8.3f} "
                  f"{result['r_squared']:>8.3f}")

        # Style classification
        print("\nSTYLE CLASSIFICATION")
        print("-" * 70)
        styles = self.style_classification()
        print(styles.to_string(index=False))

        # Portfolio analysis
        if portfolio_weights:
            print("\nPORTFOLIO FACTOR EXPOSURE")
            print("-" * 70)
            port_exp = self.portfolio_exposure(portfolio_weights)
            print(f"Weights: {portfolio_weights}")
            print(f"Portfolio Alpha (ann): {port_exp['alpha_annual']*100:.2f}%")
            for factor, beta in port_exp['betas'].items():
                print(f"  {factor} Beta: {beta:.3f}")

# Test
system = ProductionFactorSystem(
    returns=stock_returns,
    market_returns=market_returns,
    factors=factors_3
)

test_weights = {'AAPL': 0.3, 'MSFT': 0.3, 'JNJ': 0.2, 'XOM': 0.2}
system.report(test_weights)

Key Takeaways

What You Learned

1. CAPM and Beta

  • Beta measures sensitivity to market movements
  • Can be calculated via covariance or regression
  • Rolling beta shows time-varying exposure

2. Multi-Factor Models

  • Fama-French 3-factor: Market, Size (SMB), Value (HML)
  • 5-factor adds Profitability (RMW) and Investment (CMA)
  • More factors explain more variance but risk overfitting

3. Alpha Analysis

  • Alpha is return not explained by factor exposures
  • Statistical significance requires t-stat > 1.96
  • Information Ratio measures alpha per unit tracking error

4. Factor Attribution

  • Decomposes returns into factor contributions
  • Style analysis classifies by size and value loadings
  • Portfolio exposure is weighted average of stock exposures

Coming Up Next

In Module 10: Monte Carlo Simulation, we'll explore: - Geometric Brownian Motion simulation - Correlated multi-asset simulation - Option pricing with Monte Carlo - Portfolio simulation and scenario analysis


Congratulations on completing Module 9!

Module 10: Monte Carlo Simulation

Course 3: Quantitative Finance & Portfolio Theory
Part 4: Simulation & Analytics


Learning Objectives

By the end of this module, you will be able to:

  1. Generate reproducible random samples for financial simulations
  2. Simulate stock price paths using Geometric Brownian Motion
  3. Create correlated multi-asset simulations with Cholesky decomposition
  4. Apply Monte Carlo methods to portfolio analysis and option pricing
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 7-8: Risk Modeling

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
from scipy.linalg import cholesky
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Module 10: Monte Carlo Simulation - Ready!')

Load Data

# Download data
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
data = yf.download(tickers, start='2015-01-01', end='2024-01-01', progress=False)

# Handle MultiIndex columns
if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

returns = prices.pct_change().dropna()
print(f'Data loaded: {len(prices)} days')
print(f'Assets: {list(returns.columns)}')

Section 10.1: Random Number Generation

Monte Carlo methods depend on generating random numbers. Understanding how to properly generate and control randomness is essential for reproducible research.

In this section, you will learn: - Pseudo-random number generation with seeds - Modern NumPy Generator objects - Generating samples from different distributions

10.1.1 Reproducibility with Seeds

Computers generate "pseudo-random" numbers - they appear random but are deterministic given a starting seed. This is crucial for reproducible research.

# Basic random number generation
print("Random Numbers Without Seed (different each run):")
print(np.random.randn(5))
print(np.random.randn(5))

print("\nWith Seed (reproducible):")
np.random.seed(42)
print(np.random.randn(5))

np.random.seed(42)  # Reset to same seed
print(np.random.randn(5))  # Same numbers!

10.1.2 Modern Generator Objects

NumPy's modern approach uses Generator objects for better control and performance.

# Modern NumPy random generation
rng = np.random.default_rng(seed=42)

print("Using Generator object:")
print(f"Standard normal: {rng.standard_normal(5)}")
print(f"Uniform [0,1]:   {rng.uniform(0, 1, 5)}")
print(f"Integers [1,10]: {rng.integers(1, 10, 5)}")

# Multiple independent generators
rng1 = np.random.default_rng(seed=100)
rng2 = np.random.default_rng(seed=200)

print(f"\nGenerator 1: {rng1.standard_normal(3)}")
print(f"Generator 2: {rng2.standard_normal(3)}")

10.1.3 Different Distributions

Financial returns often have "fat tails" - extreme values occur more frequently than a normal distribution predicts.

rng = np.random.default_rng(seed=42)
n_samples = 10000

# Generate samples from various distributions
normal = rng.normal(loc=0, scale=1, size=n_samples)
student_t = rng.standard_t(df=5, size=n_samples)
uniform = rng.uniform(-1, 1, size=n_samples)
lognormal = rng.lognormal(mean=0, sigma=0.5, size=n_samples)

# Compare distributions
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

distributions = [
    (normal, 'Normal(0, 1)', 'steelblue'),
    (student_t, 'Student-t (df=5)', 'orange'),
    (uniform, 'Uniform(-1, 1)', 'green'),
    (lognormal, 'Log-Normal(0, 0.5)', 'crimson')
]

for ax, (data, name, color) in zip(axes.flatten(), distributions):
    ax.hist(data, bins=50, density=True, alpha=0.7, color=color, edgecolor='white')
    ax.set_title(name)
    ax.set_xlabel('Value')
    ax.set_ylabel('Density')

plt.tight_layout()
plt.show()

Exercise 10.1: Fat Tail Analysis (Guided)

Your Task: Compare the frequency of extreme events between Normal and Student-t distributions.

Fill in the blanks to complete the analysis:

Exercise
Click to reveal solution
def compare_tail_events(n_samples: int = 100000, threshold: float = 3.0, df: int = 4) -> dict:
    """
    Compare extreme events between Normal and Student-t distributions.
    """
    rng = np.random.default_rng(seed=42)

    # Generate normal samples with mean=0, std=1
    normal_samples = rng.normal(0, 1, n_samples)

    # Generate Student-t samples
    t_samples = rng.standard_t(df=df, size=n_samples)

    # Count events beyond threshold std devs
    normal_extreme = np.sum(np.abs(normal_samples) > threshold)
    t_extreme = np.sum(np.abs(t_samples) > threshold)

    return {
        'normal_count': normal_extreme,
        'normal_pct': normal_extreme / n_samples * 100,
        't_count': t_extreme,
        't_pct': t_extreme / n_samples * 100,
        'ratio': t_extreme / max(normal_extreme, 1)
    }

# Test
result = compare_tail_events()
print(f"Normal extreme events: {result['normal_count']} ({result['normal_pct']:.3f}%)")
print(f"Student-t extreme events: {result['t_count']} ({result['t_pct']:.3f}%)")
print(f"Student-t has {result['ratio']:.1f}x more extreme events")

Section 10.2: Simulating Price Paths

The standard model for stock prices assumes they follow Geometric Brownian Motion (GBM).

In this section, you will learn: - The GBM model and its assumptions - Simulating single-asset price paths - Adding fat tails to simulations

10.2.1 Geometric Brownian Motion

The discrete-time solution for GBM is:

$$S_{t+1} = S_t \exp\left[(\mu - \frac{\sigma^2}{2})\Delta t + \sigma\sqrt{\Delta t} \cdot Z\right]$$

Where $Z \sim N(0,1)$.

def simulate_gbm(S0: float, mu: float, sigma: float, T: float, 
                 n_steps: int, n_paths: int, seed: int = None) -> np.ndarray:
    """
    Simulate stock price paths using Geometric Brownian Motion.
    
    Args:
        S0: Initial stock price
        mu: Annual drift (expected return)
        sigma: Annual volatility
        T: Time horizon in years
        n_steps: Number of time steps
        n_paths: Number of simulation paths
        seed: Random seed for reproducibility
    
    Returns:
        Price paths array of shape (n_steps + 1, n_paths)
    """
    rng = np.random.default_rng(seed)
    dt = T / n_steps
    
    # Pre-compute constants
    drift = (mu - 0.5 * sigma**2) * dt
    diffusion = sigma * np.sqrt(dt)
    
    # Generate random shocks
    Z = rng.standard_normal((n_steps, n_paths))
    
    # Calculate log returns
    log_returns = drift + diffusion * Z
    
    # Build price paths
    log_prices = np.zeros((n_steps + 1, n_paths))
    log_prices[0] = np.log(S0)
    log_prices[1:] = np.log(S0) + np.cumsum(log_returns, axis=0)
    
    return np.exp(log_prices)

# Estimate parameters from SPY
spy_returns = returns['SPY']
mu_annual = spy_returns.mean() * 252
sigma_annual = spy_returns.std() * np.sqrt(252)
S0 = float(prices['SPY'].iloc[-1])

print(f"SPY Parameters:")
print(f"  Current price: ${S0:.2f}")
print(f"  Annual return (mu): {mu_annual*100:.1f}%")
print(f"  Annual volatility (sigma): {sigma_annual*100:.1f}%")
# Simulate 1 year of SPY prices
T = 1  # 1 year
n_steps = 252  # Daily steps
n_paths = 1000

paths = simulate_gbm(S0, mu_annual, sigma_annual, T, n_steps, n_paths, seed=42)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Sample paths
ax1 = axes[0]
time_grid = np.linspace(0, T, n_steps + 1)

for i in range(min(100, n_paths)):
    ax1.plot(time_grid, paths[:, i], alpha=0.1, color='steelblue', linewidth=0.5)

# Add percentile bands
percentiles = [5, 25, 50, 75, 95]
for p in percentiles:
    ax1.plot(time_grid, np.percentile(paths, p, axis=1), linewidth=2, label=f'{p}th pct')

ax1.axhline(S0, color='black', linestyle='--', label='Initial Price')
ax1.set_xlabel('Time (years)')
ax1.set_ylabel('Price ($)')
ax1.set_title(f'SPY Price Simulation ({n_paths} paths)')
ax1.legend(loc='upper left')

# Right: Terminal price distribution
ax2 = axes[1]
terminal_prices = paths[-1, :]
ax2.hist(terminal_prices, bins=50, density=True, alpha=0.7, color='steelblue', edgecolor='white')
ax2.axvline(S0, color='black', linestyle='--', linewidth=2, label=f'Initial: ${S0:.0f}')
ax2.axvline(np.mean(terminal_prices), color='orange', linewidth=2, label=f'Mean: ${np.mean(terminal_prices):.0f}')
ax2.set_xlabel('Terminal Price ($)')
ax2.set_ylabel('Density')
ax2.set_title('Distribution of 1-Year Ending Prices')
ax2.legend()

plt.tight_layout()
plt.show()

# Summary statistics
terminal_returns = (terminal_prices / S0 - 1) * 100
print(f"\nTerminal Price Statistics:")
print(f"  Mean: ${np.mean(terminal_prices):.2f} ({np.mean(terminal_returns):+.1f}%)")
print(f"  Median: ${np.median(terminal_prices):.2f}")
print(f"  5th percentile: ${np.percentile(terminal_prices, 5):.2f}")
print(f"  95th percentile: ${np.percentile(terminal_prices, 95):.2f}")

10.2.2 Adding Fat Tails

GBM assumes normal returns, but real returns have fat tails. We can use Student-t distribution instead.

def simulate_gbm_fat_tails(S0: float, mu: float, sigma: float, T: float,
                          n_steps: int, n_paths: int, df: int = 5, 
                          seed: int = None) -> np.ndarray:
    """
    Simulate price paths with fat-tailed returns (Student-t).
    
    Args:
        df: Degrees of freedom for Student-t (lower = fatter tails)
    """
    rng = np.random.default_rng(seed)
    dt = T / n_steps
    drift = (mu - 0.5 * sigma**2) * dt
    
    # Student-t has variance = df/(df-2), scale to match desired sigma
    scale = np.sqrt((df - 2) / df)
    diffusion = sigma * np.sqrt(dt) * scale
    
    # Generate Student-t shocks
    Z = rng.standard_t(df, size=(n_steps, n_paths))
    log_returns = drift + diffusion * Z
    
    log_prices = np.zeros((n_steps + 1, n_paths))
    log_prices[0] = np.log(S0)
    log_prices[1:] = np.log(S0) + np.cumsum(log_returns, axis=0)
    
    return np.exp(log_prices)

# Compare normal vs fat-tailed simulations
paths_normal = simulate_gbm(S0, mu_annual, sigma_annual, T, n_steps, n_paths, seed=42)
paths_fat = simulate_gbm_fat_tails(S0, mu_annual, sigma_annual, T, n_steps, n_paths, df=5, seed=42)

# Compare terminal distributions
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(paths_normal[-1, :], bins=50, density=True, alpha=0.5, label='Normal', color='steelblue')
ax.hist(paths_fat[-1, :], bins=50, density=True, alpha=0.5, label='Fat-tailed (df=5)', color='orange')
ax.axvline(S0, color='black', linestyle='--', linewidth=2, label='Initial Price')
ax.set_xlabel('Terminal Price ($)')
ax.set_ylabel('Density')
ax.set_title('Terminal Price Distribution: Normal vs Fat-Tailed')
ax.legend()
plt.show()

# Compare tail statistics
print("Tail Statistics Comparison:")
print(f"  1st percentile - Normal: ${np.percentile(paths_normal[-1], 1):.2f}")
print(f"  1st percentile - Fat-tailed: ${np.percentile(paths_fat[-1], 1):.2f}")

Exercise 10.2: Monte Carlo VaR Calculator (Guided)

Your Task: Use Monte Carlo simulation to estimate 1-day VaR at different confidence levels.

Fill in the blanks to complete the VaR calculator:

Exercise
Click to reveal solution
def monte_carlo_var(S0: float, mu: float, sigma: float, 
                    n_simulations: int = 10000, 
                    confidence: float = 0.95) -> dict:
    """
    Calculate 1-day VaR using Monte Carlo simulation.
    """
    rng = np.random.default_rng(seed=42)

    # Calculate daily parameters
    daily_mu = mu / 252
    daily_sigma = sigma / np.sqrt(252)

    # Generate 1-day returns using normal distribution
    sim_returns = rng.normal(daily_mu, daily_sigma, n_simulations)

    # Calculate VaR as negative percentile of returns
    alpha = 1 - confidence
    var = -np.percentile(sim_returns, alpha * 100)

    # Calculate Expected Shortfall
    threshold = np.percentile(sim_returns, alpha * 100)
    es = -np.mean(sim_returns[sim_returns <= threshold])

    return {
        'var': var,
        'var_dollar': var * S0,
        'es': es,
        'es_dollar': es * S0
    }

# Test
result = monte_carlo_var(S0, mu_annual, sigma_annual, confidence=0.95)
print(f"95% VaR: {result['var']*100:.2f}% (${result['var_dollar']:.2f})")
print(f"95% ES:  {result['es']*100:.2f}% (${result['es_dollar']:.2f})")

Section 10.3: Correlated Simulations

Real portfolios contain multiple assets that are correlated. We need to simulate paths that preserve these correlation structures.

In this section, you will learn: - Cholesky decomposition for correlation - Simulating correlated multi-asset paths - Portfolio value simulation

10.3.1 Cholesky Decomposition

To generate correlated random variables, we use the Cholesky decomposition:

$$\Sigma = LL^T$$

If $Z$ is a vector of independent standard normals, then $X = LZ$ has the desired correlation structure.

# Calculate correlation matrix from historical data
corr_matrix = returns.corr()
print("Historical Correlation Matrix:")
print(corr_matrix.round(3))

# Cholesky decomposition
L = cholesky(corr_matrix, lower=True)
print(f"\nCholesky matrix L (lower triangular):")
print(pd.DataFrame(L, index=corr_matrix.index, columns=corr_matrix.columns).round(3))

# Verify: L @ L.T should equal correlation matrix
print(f"\nVerification (L @ L.T):")
print(pd.DataFrame(L @ L.T, index=corr_matrix.index, columns=corr_matrix.columns).round(3))
def simulate_correlated_gbm(S0_vec: list, mu_vec: list, sigma_vec: list, 
                            corr_matrix: np.ndarray, T: float, 
                            n_steps: int, n_paths: int, seed: int = None) -> dict:
    """
    Simulate correlated multi-asset price paths.
    
    Returns:
        Dictionary mapping asset index to price paths array
    """
    rng = np.random.default_rng(seed)
    n_assets = len(S0_vec)
    dt = T / n_steps
    
    # Cholesky decomposition
    L = cholesky(corr_matrix, lower=True)
    
    # Generate independent normals
    Z = rng.standard_normal((n_steps, n_assets, n_paths))
    
    # Apply correlation structure
    corr_Z = np.zeros_like(Z)
    for t in range(n_steps):
        corr_Z[t] = L @ Z[t]
    
    # Calculate price paths for each asset
    paths = {}
    for i in range(n_assets):
        drift = (mu_vec[i] - 0.5 * sigma_vec[i]**2) * dt
        diffusion = sigma_vec[i] * np.sqrt(dt)
        log_returns = drift + diffusion * corr_Z[:, i, :]
        
        log_prices = np.zeros((n_steps + 1, n_paths))
        log_prices[0] = np.log(S0_vec[i])
        log_prices[1:] = np.log(S0_vec[i]) + np.cumsum(log_returns, axis=0)
        paths[i] = np.exp(log_prices)
    
    return paths

# Get parameters for all assets
assets = returns.columns.tolist()
S0_vec = [float(prices[a].iloc[-1]) for a in assets]
mu_vec = [float(returns[a].mean() * 252) for a in assets]
sigma_vec = [float(returns[a].std() * np.sqrt(252)) for a in assets]

print("Asset Parameters:")
for i, asset in enumerate(assets):
    print(f"{asset}: S0=${S0_vec[i]:.2f}, mu={mu_vec[i]*100:.1f}%, sigma={sigma_vec[i]*100:.1f}%")
# Simulate correlated paths
T = 1
n_steps = 252
n_paths = 5000

corr_paths = simulate_correlated_gbm(
    S0_vec, mu_vec, sigma_vec, corr_matrix.values, 
    T, n_steps, n_paths, seed=42
)

# Verify correlation is preserved
sim_returns_all = {}
for i, asset in enumerate(assets):
    path_returns = np.diff(np.log(corr_paths[i]), axis=0)
    sim_returns_all[asset] = path_returns.flatten()

sim_returns_df = pd.DataFrame(sim_returns_all)
sim_corr = sim_returns_df.corr()

print("Simulated Correlation Matrix:")
print(sim_corr.round(3))
print("\nDifference from Original (should be near zero):")
print((sim_corr - corr_matrix).round(3))

10.3.2 Portfolio Simulation

def simulate_portfolio_value(corr_paths: dict, weights: dict, 
                            assets: list, initial_value: float = 100000) -> np.ndarray:
    """
    Calculate portfolio value paths from correlated asset simulations.
    """
    n_steps, n_paths = corr_paths[0].shape
    port_value = np.ones((n_steps, n_paths))
    
    for i, asset in enumerate(assets):
        if asset in weights:
            normalized = corr_paths[i] / corr_paths[i][0, :]
            port_value += weights[asset] * (normalized - 1)
    
    return port_value * initial_value

# Define portfolio weights
portfolio_weights = {'SPY': 0.4, 'QQQ': 0.2, 'TLT': 0.3, 'GLD': 0.1}
initial_value = 100000

# Calculate portfolio paths
port_paths = simulate_portfolio_value(corr_paths, portfolio_weights, assets, initial_value)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Portfolio value paths
ax1 = axes[0]
time_grid = np.linspace(0, T, n_steps + 1)

for j in range(min(100, n_paths)):
    ax1.plot(time_grid, port_paths[:, j], alpha=0.05, color='steelblue', linewidth=0.5)

ax1.fill_between(time_grid,
                 np.percentile(port_paths, 5, axis=1),
                 np.percentile(port_paths, 95, axis=1),
                 alpha=0.3, color='steelblue', label='5th-95th percentile')
ax1.plot(time_grid, np.median(port_paths, axis=1), 'k-', linewidth=2, label='Median')
ax1.axhline(initial_value, color='red', linestyle='--', label='Initial Value')
ax1.set_xlabel('Time (years)')
ax1.set_ylabel('Portfolio Value ($)')
ax1.set_title(f'Portfolio Simulation ({n_paths} paths)')
ax1.legend(loc='upper left')

# Right: Terminal value distribution
ax2 = axes[1]
terminal_values = port_paths[-1, :]
ax2.hist(terminal_values / 1000, bins=50, density=True, alpha=0.7, color='steelblue')
ax2.axvline(initial_value / 1000, color='red', linestyle='--', linewidth=2, label=f'Initial: ${initial_value/1000:.0f}k')
ax2.axvline(np.mean(terminal_values) / 1000, color='orange', linewidth=2, label=f'Mean: ${np.mean(terminal_values)/1000:.0f}k')
ax2.set_xlabel('Terminal Value ($k)')
ax2.set_ylabel('Density')
ax2.set_title('Distribution of 1-Year Portfolio Values')
ax2.legend()

plt.tight_layout()
plt.show()

# Summary
print(f"\nPortfolio Simulation Summary (Initial: ${initial_value:,.0f})")
print(f"  Expected Value: ${np.mean(terminal_values):,.0f}")
print(f"  Probability of Loss: {np.mean(terminal_values < initial_value)*100:.1f}%")
print(f"  Probability of >20% Gain: {np.mean(terminal_values > initial_value*1.2)*100:.1f}%")

Exercise 10.3: Portfolio Risk Comparison (Guided)

Your Task: Compare risk metrics for different portfolio allocations using Monte Carlo.

Fill in the blanks to complete the comparison:

Exercise
Click to reveal solution
def calculate_portfolio_risk_metrics(corr_paths: dict, weights: dict, 
                                     assets: list, initial_value: float) -> dict:
    """
    Calculate risk metrics for a portfolio from Monte Carlo simulation.
    """
    port_paths = simulate_portfolio_value(corr_paths, weights, assets, initial_value)
    terminal_values = port_paths[-1, :]

    # Calculate terminal returns as percentage
    terminal_returns = (terminal_values / initial_value - 1)

    # Calculate VaR at 95%
    var_95 = -np.percentile(terminal_returns, 5)

    # Calculate Expected Shortfall
    threshold = np.percentile(terminal_returns, 5)
    es_95 = -np.mean(terminal_returns[terminal_returns <= threshold])

    return {
        'mean_return': np.mean(terminal_returns),
        'volatility': np.std(terminal_returns),
        'var_95': var_95,
        'es_95': es_95,
        'prob_loss': np.mean(terminal_returns < 0)
    }

# Test with two portfolios
aggressive = {'SPY': 0.7, 'QQQ': 0.3, 'TLT': 0.0, 'GLD': 0.0}
defensive = {'SPY': 0.2, 'QQQ': 0.0, 'TLT': 0.4, 'GLD': 0.4}

agg_metrics = calculate_portfolio_risk_metrics(corr_paths, aggressive, assets, initial_value)
def_metrics = calculate_portfolio_risk_metrics(corr_paths, defensive, assets, initial_value)

print(f"Aggressive - Return: {agg_metrics['mean_return']*100:.1f}%, VaR: {agg_metrics['var_95']*100:.1f}%")
print(f"Defensive - Return: {def_metrics['mean_return']*100:.1f}%, VaR: {def_metrics['var_95']*100:.1f}%")

Section 10.4: Applications

In this section, you will learn: - Option pricing with Monte Carlo - Retirement planning simulations - Practical considerations

10.4.1 Option Pricing

Monte Carlo can price path-dependent and exotic options that lack analytical solutions.

def monte_carlo_european_option(S0: float, K: float, T: float, r: float, 
                                sigma: float, option_type: str = 'call', 
                                n_paths: int = 100000, seed: int = None) -> dict:
    """
    Price a European option using Monte Carlo simulation.
    
    Args:
        S0: Current stock price
        K: Strike price
        T: Time to expiration (years)
        r: Risk-free rate
        sigma: Volatility
        option_type: 'call' or 'put'
    
    Returns:
        Dictionary with price and statistics
    """
    rng = np.random.default_rng(seed)
    
    # Simulate terminal stock prices under risk-neutral measure
    Z = rng.standard_normal(n_paths)
    ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
    
    # Calculate payoffs
    if option_type == 'call':
        payoffs = np.maximum(ST - K, 0)
    else:
        payoffs = np.maximum(K - ST, 0)
    
    # Discount expected payoff
    discounted_payoffs = np.exp(-r * T) * payoffs
    price = np.mean(discounted_payoffs)
    std_error = np.std(discounted_payoffs) / np.sqrt(n_paths)
    
    return {
        'price': price,
        'std_error': std_error,
        'ci_95': (price - 1.96 * std_error, price + 1.96 * std_error)
    }

def black_scholes(S0: float, K: float, T: float, r: float, 
                  sigma: float, option_type: str = 'call') -> float:
    """Analytical Black-Scholes price for comparison."""
    d1 = (np.log(S0/K) + (r + 0.5*sigma**2)*T) / (sigma*np.sqrt(T))
    d2 = d1 - sigma * np.sqrt(T)
    
    if option_type == 'call':
        price = S0 * stats.norm.cdf(d1) - K * np.exp(-r*T) * stats.norm.cdf(d2)
    else:
        price = K * np.exp(-r*T) * stats.norm.cdf(-d2) - S0 * stats.norm.cdf(-d1)
    
    return price

# Price an option
S0_opt = 100
K = 105
T_opt = 0.5
r = 0.05
sigma_opt = 0.20

mc_result = monte_carlo_european_option(S0_opt, K, T_opt, r, sigma_opt, 'call', seed=42)
bs_price = black_scholes(S0_opt, K, T_opt, r, sigma_opt, 'call')

print("European Call Option Pricing")
print(f"  Black-Scholes Price: ${bs_price:.4f}")
print(f"  Monte Carlo Price:   ${mc_result['price']:.4f}")
print(f"  MC Standard Error:   ${mc_result['std_error']:.4f}")
print(f"  Difference: ${abs(mc_result['price'] - bs_price):.4f}")

10.4.2 Retirement Planning

def retirement_simulation(initial_savings: float, annual_contribution: float,
                         years_to_retirement: int, years_in_retirement: int,
                         annual_withdrawal: float, mu: float, sigma: float,
                         n_paths: int = 10000, seed: int = None) -> dict:
    """
    Simulate retirement outcomes.
    """
    rng = np.random.default_rng(seed)
    total_years = years_to_retirement + years_in_retirement
    
    # Generate annual returns
    annual_returns = rng.normal(mu, sigma, (total_years, n_paths))
    
    # Initialize wealth
    wealth = np.zeros((total_years + 1, n_paths))
    wealth[0, :] = initial_savings
    
    # Accumulation phase
    for year in range(years_to_retirement):
        wealth[year + 1, :] = wealth[year, :] * (1 + annual_returns[year, :]) + annual_contribution
    
    # Distribution phase
    for year in range(years_to_retirement, total_years):
        new_wealth = wealth[year, :] * (1 + annual_returns[year, :]) - annual_withdrawal
        wealth[year + 1, :] = np.maximum(new_wealth, 0)
    
    final_wealth = wealth[-1, :]
    
    return {
        'wealth_paths': wealth,
        'prob_success': np.mean(final_wealth > 0),
        'median_retirement': np.median(wealth[years_to_retirement, :]),
        'median_final': np.median(final_wealth[final_wealth > 0]) if np.sum(final_wealth > 0) > 0 else 0
    }

# Run simulation
result = retirement_simulation(
    initial_savings=100000,
    annual_contribution=20000,
    years_to_retirement=25,
    years_in_retirement=30,
    annual_withdrawal=80000,
    mu=0.07,
    sigma=0.15,
    seed=42
)

print(f"Retirement Planning Results")
print(f"  Probability of Success: {result['prob_success']*100:.1f}%")
print(f"  Median Wealth at Retirement: ${result['median_retirement']:,.0f}")
print(f"  Median Final Wealth (if successful): ${result['median_final']:,.0f}")

Exercise 10.4: Option Price Sensitivity (Open-ended)

Your Task:

Build a function that: - Calculates option prices for a range of strike prices - Uses Monte Carlo simulation - Returns a DataFrame with strike, call price, and put price - Includes confidence intervals

Your implementation:

Exercise
Click to reveal solution
def option_price_sensitivity(S0: float, T: float, r: float, sigma: float,
                            strikes: list, n_paths: int = 50000) -> pd.DataFrame:
    """
    Calculate option prices for multiple strikes using Monte Carlo.

    Args:
        S0: Current stock price
        T: Time to expiration
        r: Risk-free rate
        sigma: Volatility
        strikes: List of strike prices

    Returns:
        DataFrame with strikes and option prices
    """
    rng = np.random.default_rng(seed=42)

    # Simulate terminal prices once
    Z = rng.standard_normal(n_paths)
    ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)
    discount = np.exp(-r * T)

    results = []
    for K in strikes:
        # Call payoffs
        call_payoffs = np.maximum(ST - K, 0) * discount
        call_price = np.mean(call_payoffs)
        call_se = np.std(call_payoffs) / np.sqrt(n_paths)

        # Put payoffs
        put_payoffs = np.maximum(K - ST, 0) * discount
        put_price = np.mean(put_payoffs)
        put_se = np.std(put_payoffs) / np.sqrt(n_paths)

        results.append({
            'Strike': K,
            'Call Price': call_price,
            'Call CI': f"{call_price:.3f} ± {1.96*call_se:.3f}",
            'Put Price': put_price,
            'Put CI': f"{put_price:.3f} ± {1.96*put_se:.3f}",
            'Moneyness': K / S0
        })

    return pd.DataFrame(results)

# Test
strikes = [90, 95, 100, 105, 110]
sensitivity_df = option_price_sensitivity(S0=100, T=0.5, r=0.05, sigma=0.20, strikes=strikes)
print(sensitivity_df.to_string(index=False))

Exercise 10.5: Retirement Scenario Analysis (Open-ended)

Your Task:

Build a function that: - Takes multiple withdrawal rate scenarios - Runs retirement simulations for each - Returns success probability for each scenario - Identifies the safe withdrawal rate (>95% success)

Your implementation:

Exercise
Click to reveal solution
def retirement_scenario_analysis(initial_savings: float, annual_contribution: float,
                                years_to_retirement: int, years_in_retirement: int,
                                withdrawal_rates: list, mu: float, sigma: float,
                                n_paths: int = 5000) -> pd.DataFrame:
    """
    Analyze multiple withdrawal rate scenarios.

    Args:
        withdrawal_rates: List of withdrawal rates as fraction of initial retirement wealth

    Returns:
        DataFrame with scenario results
    """
    rng = np.random.default_rng(seed=42)
    total_years = years_to_retirement + years_in_retirement

    # Generate returns once
    annual_returns = rng.normal(mu, sigma, (total_years, n_paths))

    # Accumulation phase (same for all scenarios)
    wealth_at_retirement = np.zeros(n_paths)
    wealth = initial_savings * np.ones(n_paths)

    for year in range(years_to_retirement):
        wealth = wealth * (1 + annual_returns[year, :]) + annual_contribution

    wealth_at_retirement = wealth.copy()

    results = []
    for rate in withdrawal_rates:
        # Reset to retirement wealth
        wealth = wealth_at_retirement.copy()

        # Distribution phase with fixed withdrawal rate
        annual_withdrawal = wealth_at_retirement * rate

        for year in range(years_to_retirement, total_years):
            wealth = wealth * (1 + annual_returns[year, :]) - annual_withdrawal
            wealth = np.maximum(wealth, 0)

        success_rate = np.mean(wealth > 0)
        median_final = np.median(wealth[wealth > 0]) if np.sum(wealth > 0) > 0 else 0

        results.append({
            'Withdrawal Rate': f"{rate*100:.1f}%",
            'Annual Withdrawal': f"${np.median(annual_withdrawal):,.0f}",
            'Success Rate': f"{success_rate*100:.1f}%",
            'Median Final': f"${median_final:,.0f}",
            'Safe': '✓' if success_rate >= 0.95 else ''
        })

    return pd.DataFrame(results)

# Test
withdrawal_rates = [0.03, 0.035, 0.04, 0.045, 0.05, 0.055, 0.06]
scenarios = retirement_scenario_analysis(
    initial_savings=100000,
    annual_contribution=20000,
    years_to_retirement=25,
    years_in_retirement=30,
    withdrawal_rates=withdrawal_rates,
    mu=0.07,
    sigma=0.15
)
print(scenarios.to_string(index=False))

Exercise 10.6: Complete Monte Carlo Engine (Open-ended)

Your Task:

Build a comprehensive Monte Carlo simulation class that includes: - Single and multi-asset GBM simulation - Normal and fat-tailed distributions - Portfolio simulation with custom weights - Risk metrics calculation (VaR, ES) - Option pricing capabilities

Your implementation:

Exercise
Click to reveal solution
class MonteCarloEngine:
    """
    Comprehensive Monte Carlo simulation engine for financial analysis.

    Features:
    - Single and multi-asset price simulation
    - Normal and fat-tailed distributions
    - Portfolio simulation
    - Risk metrics calculation
    - Option pricing
    """

    def __init__(self, seed: int = None):
        """Initialize with optional seed for reproducibility."""
        self.seed = seed
        self.rng = np.random.default_rng(seed)

    def reset_seed(self):
        """Reset random number generator."""
        self.rng = np.random.default_rng(self.seed)

    def simulate_gbm(self, S0: float, mu: float, sigma: float, T: float,
                     n_steps: int, n_paths: int, fat_tails: bool = False,
                     df: int = 5) -> np.ndarray:
        """Simulate single-asset price paths."""
        dt = T / n_steps
        drift = (mu - 0.5 * sigma**2) * dt

        if fat_tails:
            scale = np.sqrt((df - 2) / df)
            diffusion = sigma * np.sqrt(dt) * scale
            Z = self.rng.standard_t(df, size=(n_steps, n_paths))
        else:
            diffusion = sigma * np.sqrt(dt)
            Z = self.rng.standard_normal((n_steps, n_paths))

        log_returns = drift + diffusion * Z
        log_prices = np.zeros((n_steps + 1, n_paths))
        log_prices[0] = np.log(S0)
        log_prices[1:] = np.log(S0) + np.cumsum(log_returns, axis=0)

        return np.exp(log_prices)

    def simulate_correlated(self, S0_vec: list, mu_vec: list, sigma_vec: list,
                           corr_matrix: np.ndarray, T: float,
                           n_steps: int, n_paths: int) -> dict:
        """Simulate correlated multi-asset paths."""
        n_assets = len(S0_vec)
        dt = T / n_steps
        L = cholesky(corr_matrix, lower=True)

        Z = self.rng.standard_normal((n_steps, n_assets, n_paths))
        corr_Z = np.zeros_like(Z)
        for t in range(n_steps):
            corr_Z[t] = L @ Z[t]

        paths = {}
        for i in range(n_assets):
            drift = (mu_vec[i] - 0.5 * sigma_vec[i]**2) * dt
            diffusion = sigma_vec[i] * np.sqrt(dt)
            log_returns = drift + diffusion * corr_Z[:, i, :]
            log_prices = np.zeros((n_steps + 1, n_paths))
            log_prices[0] = np.log(S0_vec[i])
            log_prices[1:] = np.log(S0_vec[i]) + np.cumsum(log_returns, axis=0)
            paths[i] = np.exp(log_prices)

        return paths

    def calculate_risk_metrics(self, returns: np.ndarray, 
                              confidence: float = 0.95) -> dict:
        """Calculate VaR and ES from simulated returns."""
        alpha = 1 - confidence
        var = -np.percentile(returns, alpha * 100)
        threshold = np.percentile(returns, alpha * 100)
        es = -np.mean(returns[returns <= threshold])

        return {'var': var, 'es': es, 'confidence': confidence}

    def price_european_option(self, S0: float, K: float, T: float,
                             r: float, sigma: float, option_type: str = 'call',
                             n_paths: int = 100000) -> dict:
        """Price European option using Monte Carlo."""
        Z = self.rng.standard_normal(n_paths)
        ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * Z)

        if option_type == 'call':
            payoffs = np.maximum(ST - K, 0)
        else:
            payoffs = np.maximum(K - ST, 0)

        discounted = np.exp(-r * T) * payoffs
        price = np.mean(discounted)
        std_error = np.std(discounted) / np.sqrt(n_paths)

        return {'price': price, 'std_error': std_error,
                'ci_95': (price - 1.96*std_error, price + 1.96*std_error)}

    def summary_stats(self, paths: np.ndarray) -> dict:
        """Calculate summary statistics for simulation paths."""
        terminal = paths[-1, :]
        initial = paths[0, 0]
        returns = (terminal - initial) / initial

        return {
            'mean_terminal': np.mean(terminal),
            'median_terminal': np.median(terminal),
            'mean_return': np.mean(returns),
            'volatility': np.std(returns),
            'percentile_5': np.percentile(terminal, 5),
            'percentile_95': np.percentile(terminal, 95),
            'prob_gain': np.mean(terminal > initial)
        }

# Demo
mc = MonteCarloEngine(seed=42)

# Single asset simulation
paths = mc.simulate_gbm(S0=100, mu=0.08, sigma=0.20, T=1, n_steps=252, n_paths=10000)
stats = mc.summary_stats(paths)
print(f"Mean terminal: ${stats['mean_terminal']:.2f}")
print(f"Mean return: {stats['mean_return']*100:.1f}%")
print(f"Prob of gain: {stats['prob_gain']*100:.1f}%")

# Option pricing
mc.reset_seed()
option = mc.price_european_option(S0=100, K=100, T=0.5, r=0.05, sigma=0.20)
print(f"\nOption price: ${option['price']:.4f} ± ${option['std_error']:.4f}")

Module Project: Monte Carlo Simulator

Put together everything you've learned!

Your Challenge:

Build a complete Monte Carlo simulation system that includes: 1. Multi-asset correlated price simulation with configurable distribution (normal or Student-t) 2. Portfolio value projection with custom asset weights 3. Comprehensive risk metrics (VaR, ES, probability of loss) 4. Option pricing with Greeks estimation 5. Summary report generation with visualization

# YOUR CODE HERE - Module Project
Click to reveal solution
class MonteCarloSimulator:
    """
    Complete Monte Carlo simulation system for portfolio analysis.
    """

    def __init__(self, seed: int = 42):
        self.seed = seed
        self.rng = np.random.default_rng(seed)
        self.results = {}

    def simulate_portfolio(self, assets: dict, corr_matrix: np.ndarray,
                          weights: dict, T: float = 1, n_steps: int = 252,
                          n_paths: int = 10000, initial_value: float = 100000,
                          fat_tails: bool = False, df: int = 5) -> dict:
        """
        Simulate portfolio value paths.

        Args:
            assets: Dict of {name: {'S0': price, 'mu': return, 'sigma': vol}}
            corr_matrix: Correlation matrix
            weights: Dict of {name: weight}
        """
        asset_names = list(assets.keys())
        n_assets = len(asset_names)
        dt = T / n_steps

        # Cholesky decomposition
        L = cholesky(corr_matrix, lower=True)

        # Generate correlated shocks
        if fat_tails:
            scale = np.sqrt((df - 2) / df)
            Z_raw = self.rng.standard_t(df, size=(n_steps, n_assets, n_paths))
        else:
            scale = 1.0
            Z_raw = self.rng.standard_normal((n_steps, n_assets, n_paths))

        Z = np.zeros_like(Z_raw)
        for t in range(n_steps):
            Z[t] = L @ Z_raw[t]

        # Simulate each asset
        asset_paths = {}
        for i, name in enumerate(asset_names):
            params = assets[name]
            drift = (params['mu'] - 0.5 * params['sigma']**2) * dt
            diffusion = params['sigma'] * np.sqrt(dt) * scale

            log_returns = drift + diffusion * Z[:, i, :]
            log_prices = np.zeros((n_steps + 1, n_paths))
            log_prices[0] = np.log(params['S0'])
            log_prices[1:] = np.log(params['S0']) + np.cumsum(log_returns, axis=0)
            asset_paths[name] = np.exp(log_prices)

        # Calculate portfolio value
        port_value = np.ones((n_steps + 1, n_paths))
        for name in asset_names:
            if name in weights:
                normalized = asset_paths[name] / asset_paths[name][0, :]
                port_value += weights[name] * (normalized - 1)

        port_value *= initial_value

        # Store results
        self.results = {
            'asset_paths': asset_paths,
            'portfolio_paths': port_value,
            'initial_value': initial_value,
            'terminal_values': port_value[-1, :]
        }

        return self.results

    def calculate_risk_metrics(self, confidence: float = 0.95) -> dict:
        """Calculate comprehensive risk metrics."""
        terminal = self.results['terminal_values']
        initial = self.results['initial_value']
        returns = (terminal - initial) / initial

        alpha = 1 - confidence
        var = -np.percentile(returns, alpha * 100)
        threshold = np.percentile(returns, alpha * 100)
        es = -np.mean(returns[returns <= threshold])

        return {
            'mean_return': np.mean(returns),
            'volatility': np.std(returns),
            f'var_{int(confidence*100)}': var,
            f'es_{int(confidence*100)}': es,
            'prob_loss': np.mean(returns < 0),
            'prob_10pct_loss': np.mean(returns < -0.10),
            'prob_20pct_gain': np.mean(returns > 0.20)
        }

    def generate_report(self) -> None:
        """Generate summary report with visualization."""
        metrics = self.calculate_risk_metrics()

        print("="*60)
        print("MONTE CARLO SIMULATION REPORT")
        print("="*60)
        print(f"\nInitial Investment: ${self.results['initial_value']:,.0f}")
        print(f"\nRETURN STATISTICS")
        print(f"  Expected Return: {metrics['mean_return']*100:.1f}%")
        print(f"  Volatility: {metrics['volatility']*100:.1f}%")
        print(f"\nRISK METRICS (95%)")
        print(f"  Value at Risk: {metrics['var_95']*100:.1f}%")
        print(f"  Expected Shortfall: {metrics['es_95']*100:.1f}%")
        print(f"\nPROBABILITIES")
        print(f"  Probability of Loss: {metrics['prob_loss']*100:.1f}%")
        print(f"  Probability of >10% Loss: {metrics['prob_10pct_loss']*100:.1f}%")
        print(f"  Probability of >20% Gain: {metrics['prob_20pct_gain']*100:.1f}%")

        # Visualization
        fig, axes = plt.subplots(1, 2, figsize=(14, 5))

        # Portfolio paths
        paths = self.results['portfolio_paths']
        n_steps = paths.shape[0]
        time_grid = np.linspace(0, 1, n_steps)

        for i in range(min(100, paths.shape[1])):
            axes[0].plot(time_grid, paths[:, i], alpha=0.05, color='steelblue')

        axes[0].fill_between(time_grid,
                            np.percentile(paths, 5, axis=1),
                            np.percentile(paths, 95, axis=1),
                            alpha=0.3, color='steelblue')
        axes[0].plot(time_grid, np.median(paths, axis=1), 'k-', linewidth=2)
        axes[0].axhline(self.results['initial_value'], color='red', linestyle='--')
        axes[0].set_xlabel('Time (years)')
        axes[0].set_ylabel('Portfolio Value ($)')
        axes[0].set_title('Portfolio Value Simulation')

        # Terminal distribution
        terminal = self.results['terminal_values']
        axes[1].hist(terminal/1000, bins=50, density=True, alpha=0.7, color='steelblue')
        axes[1].axvline(self.results['initial_value']/1000, color='red', linestyle='--', label='Initial')
        axes[1].axvline(np.mean(terminal)/1000, color='orange', linewidth=2, label='Mean')
        axes[1].set_xlabel('Terminal Value ($k)')
        axes[1].set_ylabel('Density')
        axes[1].set_title('Terminal Value Distribution')
        axes[1].legend()

        plt.tight_layout()
        plt.show()

# Demo
simulator = MonteCarloSimulator(seed=42)

# Define assets
assets_config = {
    'SPY': {'S0': 450, 'mu': 0.10, 'sigma': 0.18},
    'QQQ': {'S0': 380, 'mu': 0.12, 'sigma': 0.25},
    'TLT': {'S0': 100, 'mu': 0.04, 'sigma': 0.15},
    'GLD': {'S0': 180, 'mu': 0.05, 'sigma': 0.12}
}

corr = np.array([
    [1.00, 0.85, -0.30, 0.05],
    [0.85, 1.00, -0.25, 0.00],
    [-0.30, -0.25, 1.00, 0.25],
    [0.05, 0.00, 0.25, 1.00]
])

weights_config = {'SPY': 0.40, 'QQQ': 0.20, 'TLT': 0.30, 'GLD': 0.10}

# Run simulation
simulator.simulate_portfolio(assets_config, corr, weights_config,
                            T=1, n_paths=10000, initial_value=100000)
simulator.generate_report()

Key Takeaways

What You Learned

1. Random Number Generation

  • Use seeds for reproducibility and audit trails
  • Modern NumPy Generator objects provide better control
  • Different distributions model different phenomena

2. Price Path Simulation

  • GBM is the standard model but assumes normal returns
  • Fat-tailed distributions better capture extreme events
  • Terminal price distribution is log-normal (positively skewed)

3. Correlated Simulations

  • Cholesky decomposition transforms independent normals to correlated
  • Correlation structure is preserved in multi-asset simulations
  • Portfolio risk depends on both individual volatilities and correlations

4. Applications

  • Monte Carlo can price complex options analytically intractable
  • Portfolio projections show probability distributions, not point estimates
  • Retirement planning benefits from probability-based thinking

Coming Up Next

In Module 11: Performance Attribution, we'll explore: - Decomposing portfolio returns into components - Brinson attribution (allocation vs selection) - Factor-based attribution - Risk attribution and budgeting


Congratulations on completing Module 10!

Module 11: Performance Attribution

Course 3: Quantitative Finance & Portfolio Theory
Part 4: Simulation & Analytics


Learning Objectives

By the end of this module, you will be able to:

  1. Decompose portfolio returns using Brinson attribution
  2. Apply factor-based attribution using regression analysis
  3. Calculate risk attribution and contribution metrics
  4. Build comprehensive attribution reports
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 4-6: Portfolio Theory, Module 9: Factor Models

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Module 11: Performance Attribution - Ready!')

Load Data

# Download sector ETF data for attribution analysis
sector_etfs = {
    'XLK': 'Technology', 'XLF': 'Financials', 'XLV': 'Healthcare',
    'XLE': 'Energy', 'XLY': 'Consumer Disc', 'XLP': 'Consumer Staples',
    'XLI': 'Industrials', 'XLU': 'Utilities', 'XLB': 'Materials'
}

tickers = list(sector_etfs.keys()) + ['SPY']
data = yf.download(tickers, start='2020-01-01', end='2024-01-01', progress=False)

if isinstance(data.columns, pd.MultiIndex):
    if 'Adj Close' in data.columns.get_level_values(0):
        prices = data['Adj Close']
    elif 'Close' in data.columns.get_level_values(0):
        prices = data['Close']
else:
    prices = data['Adj Close'] if 'Adj Close' in data.columns else data['Close']

returns = prices.pct_change().dropna()
print(f'Data loaded: {len(returns)} days')
print(f'Sectors: {len(sector_etfs)}')

Section 11.1: Attribution Basics

Performance attribution answers the crucial question: "Why did the portfolio perform the way it did?"

In this section, you will learn: - Why attribution matters for portfolio management - Active return and tracking error - Information ratio for skill measurement

11.1.1 Understanding Active Management

Active return = Portfolio return - Benchmark return

# Define portfolio and benchmark weights
benchmark_weights = {
    'XLK': 0.28, 'XLF': 0.13, 'XLV': 0.13, 'XLE': 0.04,
    'XLY': 0.10, 'XLP': 0.07, 'XLI': 0.09, 'XLU': 0.03, 'XLB': 0.03
}

portfolio_weights = {
    'XLK': 0.35, 'XLF': 0.08, 'XLV': 0.15, 'XLE': 0.02,
    'XLY': 0.12, 'XLP': 0.05, 'XLI': 0.10, 'XLU': 0.03, 'XLB': 0.05
}

# Calculate returns
sector_tickers = list(sector_etfs.keys())
sector_returns = returns[sector_tickers]

port_weights = np.array([portfolio_weights[t] for t in sector_tickers])
bench_weights = np.array([benchmark_weights[t] for t in sector_tickers])

portfolio_returns = (sector_returns * port_weights).sum(axis=1)
benchmark_returns = returns['SPY']
active_returns = portfolio_returns - benchmark_returns

# Performance summary
print("Performance Summary (Annualized)")
print("="*50)
print(f"Portfolio Return:  {portfolio_returns.mean() * 252 * 100:.2f}%")
print(f"Benchmark Return:  {benchmark_returns.mean() * 252 * 100:.2f}%")
print(f"Active Return:     {active_returns.mean() * 252 * 100:.2f}%")
print(f"\nTracking Error:    {active_returns.std() * np.sqrt(252) * 100:.2f}%")
print(f"Information Ratio: {(active_returns.mean() * 252) / (active_returns.std() * np.sqrt(252)):.2f}")

Exercise 11.1: Active Return Analysis (Guided)

Your Task: Calculate comprehensive active return statistics including hit rate and win/loss ratio.

Fill in the blanks to complete the analysis:

Exercise
Click to reveal solution
def analyze_active_returns(portfolio_ret: pd.Series, benchmark_ret: pd.Series) -> dict:
    """
    Analyze active return characteristics.
    """
    active = portfolio_ret - benchmark_ret

    # Calculate the mean of active returns
    mean_active = active.mean()

    # Calculate hit rate (proportion of positive active returns)
    hit_rate = (active > 0).mean()

    # Calculate average win and average loss
    avg_win = active[active > 0].mean()
    avg_loss = active[active < 0].mean()

    # Calculate win/loss ratio
    win_loss_ratio = abs(avg_win) / abs(avg_loss)

    return {
        'mean_active': mean_active * 252,
        'tracking_error': active.std() * np.sqrt(252),
        'information_ratio': (mean_active * 252) / (active.std() * np.sqrt(252)),
        'hit_rate': hit_rate,
        'win_loss_ratio': win_loss_ratio
    }

# Test
metrics = analyze_active_returns(portfolio_returns, benchmark_returns)
print(f"Hit Rate: {metrics['hit_rate']*100:.1f}%")
print(f"Win/Loss Ratio: {metrics['win_loss_ratio']:.2f}")
print(f"Information Ratio: {metrics['information_ratio']:.2f}")

Section 11.2: Brinson Attribution

The Brinson model decomposes active return into allocation, selection, and interaction effects.

In this section, you will learn: - Allocation effect: Value from sector weighting decisions - Selection effect: Value from security selection within sectors - Interaction effect: Combined impact

def brinson_attribution(portfolio_weights: dict, benchmark_weights: dict,
                        portfolio_returns: dict, benchmark_sector_returns: dict,
                        benchmark_total_return: float) -> pd.DataFrame:
    """
    Calculate Brinson attribution for a single period.
    
    Allocation Effect: (wp - wb) * (rb - rb_total)
    Selection Effect: wb * (rp - rb)
    Interaction Effect: (wp - wb) * (rp - rb)
    """
    results = []
    
    for sector in portfolio_weights.keys():
        wp = portfolio_weights[sector]
        wb = benchmark_weights[sector]
        rp = portfolio_returns[sector]
        rb = benchmark_sector_returns[sector]
        rb_total = benchmark_total_return
        
        allocation = (wp - wb) * (rb - rb_total)
        selection = wb * (rp - rb)
        interaction = (wp - wb) * (rp - rb)
        total = allocation + selection + interaction
        
        results.append({
            'Sector': sector,
            'Active Weight': wp - wb,
            'Allocation': allocation,
            'Selection': selection,
            'Interaction': interaction,
            'Total': total
        })
    
    return pd.DataFrame(results)

# Calculate cumulative returns for the period
cum_returns = (1 + sector_returns).prod() - 1
benchmark_cum = (1 + benchmark_returns).prod() - 1

# Add simulated selection alpha
np.random.seed(42)
selection_alpha = {ticker: np.random.normal(0, 0.03) for ticker in sector_tickers}

portfolio_sector_returns = {t: float(cum_returns[t]) + selection_alpha[t] for t in sector_tickers}
benchmark_sector_returns = {t: float(cum_returns[t]) for t in sector_tickers}

# Run attribution
attribution_df = brinson_attribution(
    portfolio_weights, benchmark_weights,
    portfolio_sector_returns, benchmark_sector_returns,
    float(benchmark_cum)
)

print("Brinson Attribution Results")
print("="*70)
print(f"Total Allocation:  {attribution_df['Allocation'].sum()*100:.2f}%")
print(f"Total Selection:   {attribution_df['Selection'].sum()*100:.2f}%")
print(f"Total Interaction: {attribution_df['Interaction'].sum()*100:.2f}%")
print(f"Total Active:      {attribution_df['Total'].sum()*100:.2f}%")

Exercise 11.2: Monthly Brinson Attribution (Guided)

Your Task: Calculate Brinson attribution on a monthly basis to track attribution over time.

Fill in the blanks to complete the time-series attribution:

Exercise
Click to reveal solution
def monthly_brinson(sector_returns: pd.DataFrame, benchmark_returns: pd.Series,
                    portfolio_weights: dict, benchmark_weights: dict) -> pd.DataFrame:
    """
    Calculate monthly Brinson attribution.
    """
    # Resample to monthly returns using compound formula
    monthly_sector = sector_returns.resample('M').apply(lambda x: (1 + x).prod() - 1)
    monthly_bench = benchmark_returns.resample('M').apply(lambda x: (1 + x).prod() - 1)

    results = []
    for date in monthly_sector.index:
        sector_ret = monthly_sector.loc[date].to_dict()
        bench_ret = float(monthly_bench.loc[date])

        # Simulate selection alpha
        np.random.seed(int(date.timestamp()) % 10000)
        port_ret = {t: sector_ret[t] + np.random.normal(0, 0.01) for t in sector_ret}

        # Calculate allocation effect
        allocation = sum([
            (portfolio_weights[s] - benchmark_weights[s]) * (sector_ret[s] - bench_ret) 
            for s in portfolio_weights
        ])

        # Calculate selection effect
        selection = sum([
            benchmark_weights[s] * (port_ret[s] - sector_ret[s]) 
            for s in portfolio_weights
        ])

        results.append({
            'Date': date,
            'Allocation': allocation,
            'Selection': selection,
            'Total': allocation + selection
        })

    return pd.DataFrame(results).set_index('Date')

# Test
monthly_attr = monthly_brinson(sector_returns, benchmark_returns, portfolio_weights, benchmark_weights)
print(f"Mean Monthly Allocation: {monthly_attr['Allocation'].mean()*100:.3f}%")
print(f"Mean Monthly Selection: {monthly_attr['Selection'].mean()*100:.3f}%")
print(f"Allocation Hit Rate: {(monthly_attr['Allocation'] > 0).mean()*100:.1f}%")

Section 11.3: Factor Attribution

Factor attribution uses regression to decompose returns by systematic factor exposures.

In this section, you will learn: - Regression-based factor attribution - Alpha and beta decomposition - Factor contribution analysis

# Create factor proxies
factor_tickers = ['IWM', 'IWF', 'IWD']  # Small cap, Growth, Value
factor_data = yf.download(factor_tickers + ['SPY'], start='2020-01-01', end='2024-01-01', progress=False)

if isinstance(factor_data.columns, pd.MultiIndex):
    factor_prices = factor_data['Adj Close'] if 'Adj Close' in factor_data.columns.get_level_values(0) else factor_data['Close']
else:
    factor_prices = factor_data

factor_returns = factor_prices.pct_change().dropna()

# Create factor returns (excess over market)
factors_df = pd.DataFrame({
    'Market': factor_returns['SPY'],
    'SmallCap': factor_returns['IWM'] - factor_returns['SPY'],
    'Value': factor_returns['IWD'] - factor_returns['IWF']
})

print("Factor Statistics:")
print(factors_df.describe().round(4))
def factor_attribution(portfolio_returns: pd.Series, factors: pd.DataFrame) -> dict:
    """
    Perform factor attribution using OLS regression.
    
    Returns:
        Dictionary with alpha, betas, and factor contributions
    """
    # Align data
    common_idx = portfolio_returns.index.intersection(factors.index)
    y = portfolio_returns.loc[common_idx]
    X = factors.loc[common_idx]
    
    # Add constant for alpha
    X_const = np.column_stack([np.ones(len(X)), X.values])
    
    # OLS regression
    coeffs = np.linalg.lstsq(X_const, y.values, rcond=None)[0]
    
    alpha = coeffs[0]
    betas = dict(zip(X.columns, coeffs[1:]))
    
    # Factor contributions
    factor_contrib = {f: betas[f] * X[f].mean() * 252 for f in betas}
    
    # R-squared
    y_pred = X_const @ coeffs
    ss_res = np.sum((y.values - y_pred)**2)
    ss_tot = np.sum((y.values - np.mean(y.values))**2)
    r_squared = 1 - ss_res / ss_tot
    
    return {
        'alpha': alpha * 252,
        'betas': betas,
        'factor_contributions': factor_contrib,
        'r_squared': r_squared,
        'total_return': y.mean() * 252
    }

# Run factor attribution
common_dates = portfolio_returns.index.intersection(factors_df.index)
port_aligned = portfolio_returns.loc[common_dates]

factor_attr = factor_attribution(port_aligned, factors_df)

print("Factor Attribution Results")
print("="*50)
print(f"Alpha (annualized): {factor_attr['alpha']*100:.2f}%")
print(f"R-squared: {factor_attr['r_squared']*100:.1f}%")
print(f"\nFactor Betas:")
for factor, beta in factor_attr['betas'].items():
    print(f"  {factor}: {beta:.3f}")
print(f"\nFactor Contributions (annualized):")
for factor, contrib in factor_attr['factor_contributions'].items():
    print(f"  {factor}: {contrib*100:.2f}%")

Exercise 11.3: Rolling Factor Attribution (Guided)

Your Task: Calculate rolling factor exposures to track style drift over time.

Fill in the blanks to complete the rolling analysis:

Exercise
Click to reveal solution
def rolling_factor_betas(portfolio_returns: pd.Series, factors: pd.DataFrame,
                         window: int = 60) -> pd.DataFrame:
    """
    Calculate rolling factor betas.
    """
    common_idx = portfolio_returns.index.intersection(factors.index)
    y = portfolio_returns.loc[common_idx]
    X = factors.loc[common_idx]

    results = []

    # Loop through dates starting from window-1 index
    for i in range(window - 1, len(y)):
        # Get window of data
        y_window = y.iloc[i - window + 1:i + 1]
        X_window = X.iloc[i - window + 1:i + 1]

        # Add constant and run regression
        X_const = np.column_stack([np.ones(len(X_window)), X_window.values])
        coeffs = np.linalg.lstsq(X_const, y_window.values, rcond=None)[0]

        row = {'Date': y.index[i], 'Alpha': coeffs[0] * 252}
        for j, col in enumerate(X.columns):
            row[col] = coeffs[j + 1]
        results.append(row)

    return pd.DataFrame(results).set_index('Date')

# Test
rolling_betas = rolling_factor_betas(port_aligned, factors_df, window=60)
print(f"Average Market Beta: {rolling_betas['Market'].mean():.2f}")
print(f"Market Beta Range: {rolling_betas['Market'].min():.2f} to {rolling_betas['Market'].max():.2f}")
print(f"Average Alpha: {rolling_betas['Alpha'].mean()*100:.2f}%")

Section 11.4: Risk Attribution

Risk attribution decomposes portfolio risk into contributions from each position.

In this section, you will learn: - Marginal and component contribution to risk - Risk budgeting analysis - Diversification measurement

def risk_attribution(weights: np.ndarray, cov_matrix: np.ndarray) -> dict:
    """
    Calculate risk attribution metrics.
    
    MCR = (Sigma @ w) / sigma_p
    CCR = w * MCR
    PCR = CCR / sigma_p
    """
    port_var = weights @ cov_matrix @ weights
    port_vol = np.sqrt(port_var)
    
    # Marginal contribution
    mcr = (cov_matrix @ weights) / port_vol
    
    # Component contribution
    ccr = weights * mcr
    
    # Percentage contribution
    pcr = ccr / port_vol
    
    return {
        'portfolio_vol': port_vol,
        'marginal_risk': mcr,
        'component_risk': ccr,
        'percent_risk': pcr
    }

# Calculate risk attribution
cov_matrix = sector_returns.cov() * 252
weights_array = np.array([portfolio_weights[t] for t in sector_tickers])

risk_attr = risk_attribution(weights_array, cov_matrix.values)

print(f"Portfolio Volatility: {risk_attr['portfolio_vol']*100:.2f}%")
print(f"\nRisk Contribution by Sector:")
for i, ticker in enumerate(sector_tickers):
    print(f"  {sector_etfs[ticker]:18}: Weight {weights_array[i]*100:5.1f}% -> Risk {risk_attr['percent_risk'][i]*100:5.1f}%")

Exercise 11.4: Complete Attribution System (Open-ended)

Your Task:

Build a function that: - Combines Brinson and factor attribution - Includes risk contribution analysis - Returns a comprehensive attribution report

Your implementation:

Exercise
Click to reveal solution
def comprehensive_attribution(portfolio_returns: pd.Series, benchmark_returns: pd.Series,
                             sector_returns: pd.DataFrame, factors: pd.DataFrame,
                             portfolio_weights: dict, benchmark_weights: dict) -> dict:
    """
    Comprehensive attribution combining multiple methods.
    """
    # Active return analysis
    active = portfolio_returns - benchmark_returns
    active_metrics = {
        'active_return': active.mean() * 252,
        'tracking_error': active.std() * np.sqrt(252),
        'information_ratio': (active.mean() * 252) / (active.std() * np.sqrt(252))
    }

    # Factor attribution
    common_idx = portfolio_returns.index.intersection(factors.index)
    y = portfolio_returns.loc[common_idx]
    X = factors.loc[common_idx]
    X_const = np.column_stack([np.ones(len(X)), X.values])
    coeffs = np.linalg.lstsq(X_const, y.values, rcond=None)[0]

    factor_metrics = {
        'alpha': coeffs[0] * 252,
        'betas': dict(zip(X.columns, coeffs[1:]))
    }

    # Risk attribution
    sector_tickers = list(portfolio_weights.keys())
    cov_matrix = sector_returns[sector_tickers].cov() * 252
    weights = np.array([portfolio_weights[t] for t in sector_tickers])
    port_vol = np.sqrt(weights @ cov_matrix.values @ weights)
    mcr = (cov_matrix.values @ weights) / port_vol
    pcr = (weights * mcr) / port_vol

    risk_metrics = {
        'portfolio_vol': port_vol,
        'risk_contributions': dict(zip(sector_tickers, pcr))
    }

    return {
        'active_metrics': active_metrics,
        'factor_metrics': factor_metrics,
        'risk_metrics': risk_metrics
    }

# Test
report = comprehensive_attribution(
    portfolio_returns, benchmark_returns, sector_returns, factors_df,
    portfolio_weights, benchmark_weights
)

print("COMPREHENSIVE ATTRIBUTION REPORT")
print("="*50)
print(f"\nACTIVE RETURN ANALYSIS")
print(f"  Active Return: {report['active_metrics']['active_return']*100:.2f}%")
print(f"  Tracking Error: {report['active_metrics']['tracking_error']*100:.2f}%")
print(f"  Information Ratio: {report['active_metrics']['information_ratio']:.2f}")
print(f"\nFACTOR ATTRIBUTION")
print(f"  Alpha: {report['factor_metrics']['alpha']*100:.2f}%")
for f, b in report['factor_metrics']['betas'].items():
    print(f"  {f} Beta: {b:.3f}")
print(f"\nRISK ATTRIBUTION")
print(f"  Portfolio Vol: {report['risk_metrics']['portfolio_vol']*100:.2f}%")

Exercise 11.5: Attribution Visualization (Open-ended)

Your Task:

Build a function that creates professional attribution visualizations: - Waterfall chart for Brinson effects - Bar chart comparing weight vs risk contribution - Time series of rolling attribution

Your implementation:

Exercise
Click to reveal solution
def plot_attribution_dashboard(attribution_df: pd.DataFrame, 
                              risk_attr: dict, sector_names: dict) -> None:
    """
    Create comprehensive attribution visualizations.
    """
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # 1. Brinson waterfall
    ax1 = axes[0, 0]
    components = ['Allocation', 'Selection', 'Interaction']
    values = [attribution_df[c].sum() * 100 for c in components]
    colors = ['green' if v > 0 else 'red' for v in values]
    ax1.bar(components, values, color=colors, alpha=0.7)
    ax1.axhline(0, color='black', linewidth=0.5)
    ax1.set_ylabel('Contribution (%)')
    ax1.set_title('Brinson Attribution Components')
    for i, v in enumerate(values):
        ax1.text(i, v + 0.2 * np.sign(v), f'{v:.2f}%', ha='center')

    # 2. Weight vs Risk by sector
    ax2 = axes[0, 1]
    sectors = [sector_names[t] for t in attribution_df['Sector']]
    x = np.arange(len(sectors))
    width = 0.35
    weights = attribution_df['Active Weight'].values + 0.1  # Approximate total weight
    risk_pcts = risk_attr['percent_risk']
    ax2.bar(x - width/2, weights * 100, width, label='Weight', color='steelblue')
    ax2.bar(x + width/2, risk_pcts * 100, width, label='Risk Contribution', color='orange')
    ax2.set_xticks(x)
    ax2.set_xticklabels(sectors, rotation=45, ha='right')
    ax2.set_ylabel('Percentage (%)')
    ax2.set_title('Weight vs Risk Contribution')
    ax2.legend()

    # 3. Attribution by sector
    ax3 = axes[1, 0]
    total_by_sector = attribution_df['Total'].values * 100
    colors = ['green' if v > 0 else 'red' for v in total_by_sector]
    ax3.barh(sectors, total_by_sector, color=colors, alpha=0.7)
    ax3.axvline(0, color='black', linewidth=0.5)
    ax3.set_xlabel('Total Attribution (%)')
    ax3.set_title('Attribution by Sector')

    # 4. Risk contribution pie
    ax4 = axes[1, 1]
    risk_sorted = sorted(zip(sectors, risk_pcts), key=lambda x: x[1], reverse=True)
    labels = [s for s, _ in risk_sorted[:5]] + ['Other']
    sizes = [r for _, r in risk_sorted[:5]] + [sum(r for _, r in risk_sorted[5:])]
    ax4.pie(sizes, labels=labels, autopct='%1.1f%%')
    ax4.set_title('Risk Contribution Distribution')

    plt.tight_layout()
    plt.show()

# Test
plot_attribution_dashboard(attribution_df, risk_attr, sector_etfs)

Exercise 11.6: Attribution Report Generator (Open-ended)

Your Task:

Build a class that generates a complete attribution report including: - Executive summary with key metrics - Detailed Brinson attribution by sector - Factor exposure analysis - Risk decomposition - Time series analysis

Your implementation:

Exercise
Click to reveal solution
class AttributionReportGenerator:
    """
    Comprehensive attribution report generator.
    """

    def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series):
        self.portfolio_returns = portfolio_returns
        self.benchmark_returns = benchmark_returns
        common_idx = portfolio_returns.index.intersection(benchmark_returns.index)
        self.active_returns = portfolio_returns.loc[common_idx] - benchmark_returns.loc[common_idx]

    def calculate_basic_stats(self) -> dict:
        """Calculate basic performance statistics."""
        return {
            'portfolio_return': self.portfolio_returns.mean() * 252,
            'benchmark_return': self.benchmark_returns.mean() * 252,
            'active_return': self.active_returns.mean() * 252,
            'tracking_error': self.active_returns.std() * np.sqrt(252),
            'information_ratio': (self.active_returns.mean() * 252) / 
                                 (self.active_returns.std() * np.sqrt(252)),
            'hit_rate': (self.active_returns > 0).mean()
        }

    def factor_attribution(self, factors: pd.DataFrame) -> dict:
        """Factor-based attribution."""
        common_idx = self.portfolio_returns.index.intersection(factors.index)
        y = self.portfolio_returns.loc[common_idx]
        X = factors.loc[common_idx]
        X_const = np.column_stack([np.ones(len(X)), X.values])
        coeffs = np.linalg.lstsq(X_const, y.values, rcond=None)[0]

        y_pred = X_const @ coeffs
        r_squared = 1 - np.sum((y - y_pred)**2) / np.sum((y - y.mean())**2)

        return {
            'alpha': coeffs[0] * 252,
            'betas': dict(zip(X.columns, coeffs[1:])),
            'r_squared': r_squared
        }

    def generate_report(self, factors: pd.DataFrame = None) -> None:
        """Generate formatted attribution report."""
        stats = self.calculate_basic_stats()

        print("\n" + "="*60)
        print("PERFORMANCE ATTRIBUTION REPORT")
        print("="*60)

        print("\n--- PERFORMANCE SUMMARY ---")
        print(f"Portfolio Return:  {stats['portfolio_return']*100:>8.2f}%")
        print(f"Benchmark Return:  {stats['benchmark_return']*100:>8.2f}%")
        print(f"Active Return:     {stats['active_return']*100:>8.2f}%")

        print("\n--- RISK METRICS ---")
        print(f"Tracking Error:    {stats['tracking_error']*100:>8.2f}%")
        print(f"Information Ratio: {stats['information_ratio']:>8.2f}")
        print(f"Hit Rate:          {stats['hit_rate']*100:>8.1f}%")

        if factors is not None:
            factor_stats = self.factor_attribution(factors)
            print("\n--- FACTOR ATTRIBUTION ---")
            print(f"Alpha:             {factor_stats['alpha']*100:>8.2f}%")
            print(f"R-squared:         {factor_stats['r_squared']*100:>8.1f}%")
            print("\nFactor Betas:")
            for factor, beta in factor_stats['betas'].items():
                print(f"  {factor:15}: {beta:>8.3f}")

        print("\n" + "="*60)

# Test
report_gen = AttributionReportGenerator(portfolio_returns, benchmark_returns)
report_gen.generate_report(factors_df)

Module Project: Attribution System

Put together everything you've learned!

Your Challenge:

Build a complete performance attribution system that includes: 1. Brinson attribution (allocation, selection, interaction) 2. Factor attribution with rolling exposures 3. Risk attribution and contribution analysis 4. Time series attribution tracking 5. Professional report generation

# YOUR CODE HERE - Module Project
Click to reveal solution
class PerformanceAttributionSystem:
    """
    Complete performance attribution system.
    """

    def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series):
        common_idx = portfolio_returns.index.intersection(benchmark_returns.index)
        self.portfolio_returns = portfolio_returns.loc[common_idx]
        self.benchmark_returns = benchmark_returns.loc[common_idx]
        self.active_returns = self.portfolio_returns - self.benchmark_returns

    def basic_metrics(self) -> dict:
        """Calculate basic performance metrics."""
        return {
            'portfolio_return': self.portfolio_returns.mean() * 252,
            'benchmark_return': self.benchmark_returns.mean() * 252,
            'active_return': self.active_returns.mean() * 252,
            'tracking_error': self.active_returns.std() * np.sqrt(252),
            'information_ratio': (self.active_returns.mean() * 252) / 
                                 (self.active_returns.std() * np.sqrt(252)),
            'sharpe_portfolio': (self.portfolio_returns.mean() * 252) /
                               (self.portfolio_returns.std() * np.sqrt(252)),
            'hit_rate': (self.active_returns > 0).mean()
        }

    def brinson_attribution(self, portfolio_weights: dict, benchmark_weights: dict,
                           sector_returns: pd.DataFrame) -> pd.DataFrame:
        """Single-period Brinson attribution."""
        cum_sector = (1 + sector_returns).prod() - 1
        cum_bench = (1 + self.benchmark_returns).prod() - 1

        results = []
        for sector in portfolio_weights:
            wp = portfolio_weights[sector]
            wb = benchmark_weights.get(sector, 0)
            rb = float(cum_sector.get(sector, 0))
            rp = rb + np.random.normal(0, 0.02)  # Simulated selection

            allocation = (wp - wb) * (rb - cum_bench)
            selection = wb * (rp - rb)
            interaction = (wp - wb) * (rp - rb)

            results.append({
                'Sector': sector,
                'Allocation': allocation,
                'Selection': selection,
                'Interaction': interaction,
                'Total': allocation + selection + interaction
            })

        return pd.DataFrame(results)

    def factor_attribution(self, factors: pd.DataFrame) -> dict:
        """Factor-based attribution."""
        common_idx = self.portfolio_returns.index.intersection(factors.index)
        y = self.portfolio_returns.loc[common_idx]
        X = factors.loc[common_idx]
        X_const = np.column_stack([np.ones(len(X)), X.values])
        coeffs = np.linalg.lstsq(X_const, y.values, rcond=None)[0]

        y_pred = X_const @ coeffs
        r_squared = 1 - np.sum((y - y_pred)**2) / np.sum((y - y.mean())**2)

        return {
            'alpha': coeffs[0] * 252,
            'betas': dict(zip(X.columns, coeffs[1:])),
            'factor_contrib': {f: coeffs[i+1] * X[f].mean() * 252 
                              for i, f in enumerate(X.columns)},
            'r_squared': r_squared
        }

    def risk_attribution(self, weights: dict, cov_matrix: pd.DataFrame) -> dict:
        """Risk contribution analysis."""
        assets = list(weights.keys())
        w = np.array([weights[a] for a in assets])
        cov = cov_matrix.loc[assets, assets].values

        port_vol = np.sqrt(w @ cov @ w)
        mcr = (cov @ w) / port_vol
        ccr = w * mcr
        pcr = ccr / port_vol

        return {
            'portfolio_vol': port_vol,
            'risk_contributions': dict(zip(assets, pcr))
        }

    def generate_report(self, portfolio_weights: dict = None, 
                       benchmark_weights: dict = None,
                       sector_returns: pd.DataFrame = None,
                       factors: pd.DataFrame = None) -> None:
        """Generate comprehensive attribution report."""
        print("\n" + "="*70)
        print("PERFORMANCE ATTRIBUTION REPORT")
        print("="*70)

        # Basic metrics
        metrics = self.basic_metrics()
        print("\n--- PERFORMANCE SUMMARY ---")
        print(f"Portfolio Return:  {metrics['portfolio_return']*100:>10.2f}%")
        print(f"Benchmark Return:  {metrics['benchmark_return']*100:>10.2f}%")
        print(f"Active Return:     {metrics['active_return']*100:>10.2f}%")
        print(f"Tracking Error:    {metrics['tracking_error']*100:>10.2f}%")
        print(f"Information Ratio: {metrics['information_ratio']:>10.2f}")
        print(f"Hit Rate:          {metrics['hit_rate']*100:>10.1f}%")

        # Brinson attribution
        if portfolio_weights and benchmark_weights and sector_returns is not None:
            brinson = self.brinson_attribution(portfolio_weights, benchmark_weights, sector_returns)
            print("\n--- BRINSON ATTRIBUTION ---")
            print(f"Allocation Effect:  {brinson['Allocation'].sum()*100:>10.2f}%")
            print(f"Selection Effect:   {brinson['Selection'].sum()*100:>10.2f}%")
            print(f"Interaction Effect: {brinson['Interaction'].sum()*100:>10.2f}%")

        # Factor attribution
        if factors is not None:
            factor_attr = self.factor_attribution(factors)
            print("\n--- FACTOR ATTRIBUTION ---")
            print(f"Alpha:             {factor_attr['alpha']*100:>10.2f}%")
            print(f"R-squared:         {factor_attr['r_squared']*100:>10.1f}%")
            print("\nFactor Exposures:")
            for f, b in factor_attr['betas'].items():
                print(f"  {f:15}: {b:>10.3f}")

        print("\n" + "="*70)

# Demo
system = PerformanceAttributionSystem(portfolio_returns, benchmark_returns)
system.generate_report(
    portfolio_weights=portfolio_weights,
    benchmark_weights=benchmark_weights,
    sector_returns=sector_returns,
    factors=factors_df
)

Key Takeaways

What You Learned

1. Attribution Basics

  • Active return = portfolio return - benchmark return
  • Information Ratio measures risk-adjusted active return
  • Hit rate and win/loss ratio indicate consistency

2. Brinson Attribution

  • Allocation effect: Value from sector weighting
  • Selection effect: Value from security selection
  • Interaction effect: Combined allocation and selection

3. Factor Attribution

  • Regression decomposes returns by factor exposure
  • Alpha is factor-adjusted excess return
  • R-squared shows explanatory power of factors

4. Risk Attribution

  • Marginal contribution measures sensitivity to weight changes
  • Component contributions sum to total volatility
  • Identifies concentration risk

Coming Up Next

In Module 12: Building Dashboards, we'll explore: - Dashboard design principles - Plotly and Dash fundamentals - Interactive financial visualizations - Real-time updates with callbacks


Congratulations on completing Module 11!

Module 12: Building Dashboards

Course 3: Quantitative Finance & Portfolio Theory
Part 4: Simulation & Analytics


Learning Objectives

By the end of this module, you will be able to:

  1. Design effective financial dashboards following best practices
  2. Create interactive visualizations using Plotly
  3. Build multi-component dashboard layouts
  4. Understand Dash callback architecture for interactivity
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 4-8: Portfolio Theory & Risk

Setup and Imports

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import yfinance as yf
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: f'{x:.4f}')

print('Module 12: Building Dashboards - Ready!')
print('\nNote: Full Dash apps require running outside Jupyter.')
print('This notebook demonstrates Plotly visualizations and Dash concepts.')

Load Data

# Download data for visualization examples
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
data = yf.download(tickers, start='2020-01-01', end='2024-01-01', progress=False)

if isinstance(data.columns, pd.MultiIndex):
    prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
else:
    prices = data

returns = prices.pct_change().dropna()
print(f'Data loaded: {len(returns)} trading days')

Section 12.1: Dashboard Design Principles

Effective dashboards follow clear design principles that maximize information transfer.

In this section, you will learn: - Information hierarchy in dashboard design - Color conventions for financial data - Layout best practices

12.1.1 Information Hierarchy

Financial dashboards should follow a clear structure:

  1. Summary Metrics (top): Key numbers at a glance
  2. Trend Charts (middle): Performance over time
  3. Detailed Analysis (bottom): Drill-down capabilities
Principle Application
Clarity Show the most important metrics prominently
Context Always include benchmarks for comparison
Consistency Use consistent colors for gains (green) and losses (red)
Actionability Highlight items requiring attention
Simplicity Avoid chart junk and unnecessary complexity

Section 12.2: Plotly Fundamentals

Plotly creates interactive visualizations that work in Jupyter and can be embedded in Dash apps.

In this section, you will learn: - Creating figures with go.Figure - Adding multiple traces - Customizing layouts and styling

# Basic line chart with Plotly
fig = go.Figure()

# Normalize prices to 100
normalized = prices / prices.iloc[0] * 100

for col in normalized.columns:
    fig.add_trace(go.Scatter(
        x=normalized.index,
        y=normalized[col],
        name=col,
        mode='lines'
    ))

fig.update_layout(
    title='Asset Performance (Normalized to 100)',
    xaxis_title='Date',
    yaxis_title='Value',
    template='plotly_white',
    hovermode='x unified',
    legend=dict(orientation='h', yanchor='bottom', y=1.02)
)

fig.show()

12.2.1 Risk-Return Scatter Plot

# Calculate metrics
ann_returns = returns.mean() * 252
ann_vol = returns.std() * np.sqrt(252)
sharpe = ann_returns / ann_vol

# Create scatter plot
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=ann_vol * 100,
    y=ann_returns * 100,
    mode='markers+text',
    marker=dict(
        size=sharpe * 30 + 20,
        color=sharpe,
        colorscale='RdYlGn',
        colorbar=dict(title='Sharpe'),
        line=dict(width=2, color='white')
    ),
    text=returns.columns,
    textposition='top center',
    hovertemplate=(
        '<b>%{text}</b><br>' +
        'Return: %{y:.1f}%<br>' +
        'Volatility: %{x:.1f}%<br>' +
        '<extra></extra>'
    )
))

fig.update_layout(
    title='Risk-Return Profile',
    xaxis_title='Volatility (%)',
    yaxis_title='Return (%)',
    template='plotly_white',
    showlegend=False
)

fig.show()

Exercise 12.1: Correlation Heatmap (Guided)

Your Task: Create an interactive correlation heatmap using Plotly.

Fill in the blanks to complete the heatmap:

Exercise
Click to reveal solution
def create_correlation_heatmap(returns: pd.DataFrame) -> go.Figure:
    """
    Create an interactive correlation heatmap.
    """
    # Calculate correlation matrix
    corr_matrix = returns.corr()

    # Create heatmap
    fig = go.Figure(
        data=go.Heatmap(
            z=corr_matrix.values,
            x=corr_matrix.columns,
            y=corr_matrix.index,
            colorscale='RdYlGn',
            zmid=0,
            text=np.round(corr_matrix.values, 2),
            texttemplate='%{text}',
            hovertemplate='%{x} vs %{y}<br>Correlation: %{z:.3f}<extra></extra>'
        )
    )

    fig.update_layout(
        title='Asset Correlation Matrix',
        template='plotly_white',
        width=500,
        height=500
    )

    return fig

# Test
heatmap = create_correlation_heatmap(returns)
heatmap.show()

Section 12.3: Financial Charts

In this section, you will learn: - Candlestick charts with technical indicators - Multi-panel layouts with subplots - KPI indicator cards

# Download OHLC data for candlestick chart
spy_ohlc = yf.download('SPY', start='2023-10-01', end='2024-01-01', progress=False)

if isinstance(spy_ohlc.columns, pd.MultiIndex):
    spy_ohlc.columns = spy_ohlc.columns.get_level_values(0)

# Create candlestick with volume
fig = make_subplots(
    rows=2, cols=1,
    shared_xaxes=True,
    vertical_spacing=0.03,
    row_heights=[0.7, 0.3]
)

# Candlestick chart
fig.add_trace(
    go.Candlestick(
        x=spy_ohlc.index,
        open=spy_ohlc['Open'],
        high=spy_ohlc['High'],
        low=spy_ohlc['Low'],
        close=spy_ohlc['Close'],
        name='SPY'
    ),
    row=1, col=1
)

# Add moving averages
spy_ohlc['MA20'] = spy_ohlc['Close'].rolling(20).mean()
spy_ohlc['MA50'] = spy_ohlc['Close'].rolling(50).mean()

fig.add_trace(
    go.Scatter(x=spy_ohlc.index, y=spy_ohlc['MA20'], name='MA20', line=dict(color='orange', width=1)),
    row=1, col=1
)

# Volume bars
colors = ['red' if spy_ohlc['Close'].iloc[i] < spy_ohlc['Open'].iloc[i] else 'green' 
          for i in range(len(spy_ohlc))]

fig.add_trace(
    go.Bar(x=spy_ohlc.index, y=spy_ohlc['Volume'], name='Volume', marker_color=colors),
    row=2, col=1
)

fig.update_layout(
    title='SPY Price Chart with Volume',
    xaxis_rangeslider_visible=False,
    template='plotly_white',
    height=600
)

fig.show()

12.3.1 KPI Cards

def create_kpi_cards(metrics: dict) -> go.Figure:
    """
    Create KPI indicator cards.
    
    Args:
        metrics: Dict of {name: {'value': x, 'reference': y, 'suffix': '%'}}
    """
    n_metrics = len(metrics)
    fig = make_subplots(
        rows=1, cols=n_metrics,
        specs=[[{'type': 'indicator'}] * n_metrics]
    )
    
    for i, (name, data) in enumerate(metrics.items()):
        fig.add_trace(
            go.Indicator(
                mode='number+delta',
                value=data['value'],
                title={'text': name, 'font': {'size': 14}},
                delta={'reference': data.get('reference', 0), 
                       'relative': data.get('relative', False)},
                number={'suffix': data.get('suffix', ''),
                       'font': {'size': 24}}
            ),
            row=1, col=i+1
        )
    
    fig.update_layout(height=200, template='plotly_white')
    return fig

# Calculate metrics
weights = {'SPY': 0.4, 'QQQ': 0.25, 'TLT': 0.25, 'GLD': 0.1}
w = np.array([weights[t] for t in prices.columns])
port_ret = (returns * w).sum(axis=1)

metrics = {
    'YTD Return': {
        'value': ((1 + port_ret).prod() - 1) * 100,
        'reference': ((1 + returns['SPY']).prod() - 1) * 100,
        'suffix': '%'
    },
    'Volatility': {
        'value': port_ret.std() * np.sqrt(252) * 100,
        'reference': returns['SPY'].std() * np.sqrt(252) * 100,
        'suffix': '%'
    },
    'Sharpe Ratio': {
        'value': (port_ret.mean() * 252) / (port_ret.std() * np.sqrt(252)),
        'reference': 1.0
    }
}

kpi_fig = create_kpi_cards(metrics)
kpi_fig.show()

Exercise 12.2: Portfolio Dashboard Layout (Guided)

Your Task: Create a multi-panel dashboard using Plotly subplots.

Fill in the blanks to complete the dashboard:

Exercise
Click to reveal solution
def create_portfolio_dashboard(prices: pd.DataFrame, returns: pd.DataFrame,
                               weights: dict) -> go.Figure:
    """
    Create a comprehensive portfolio dashboard.
    """
    w = np.array([weights[t] for t in returns.columns])
    port_returns = (returns * w).sum(axis=1)
    port_cum = (1 + port_returns).cumprod()

    # Create 2x2 subplot layout
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Portfolio Performance', 'Asset Allocation',
                       'Drawdown', 'Return Distribution'),
        specs=[
            [{'type': 'scatter'}, {'type': 'pie'}],
            [{'type': 'scatter'}, {'type': 'histogram'}]
        ]
    )

    # Performance chart
    fig.add_trace(
        go.Scatter(x=port_cum.index, y=port_cum * 100, name='Portfolio',
                  line=dict(color='steelblue', width=2)),
        row=1, col=1
    )

    # Pie chart for allocation
    fig.add_trace(
        go.Pie(labels=list(weights.keys()), values=list(weights.values()),
               hole=0.4),
        row=1, col=2
    )

    # Drawdown chart
    running_max = port_cum.cummax()
    drawdown = (port_cum - running_max) / running_max
    fig.add_trace(
        go.Scatter(x=drawdown.index, y=drawdown * 100, 
                  fill='tozeroy', fillcolor='rgba(255,0,0,0.3)',
                  line=dict(color='red', width=1), name='Drawdown'),
        row=2, col=1
    )

    # Histogram for return distribution
    fig.add_trace(
        go.Histogram(x=port_returns * 100, nbinsx=50,
                    marker_color='steelblue', name='Returns'),
        row=2, col=2
    )

    fig.update_layout(height=700, showlegend=False, template='plotly_white')
    return fig

# Test
weights = {'SPY': 0.4, 'QQQ': 0.25, 'TLT': 0.25, 'GLD': 0.1}
dashboard = create_portfolio_dashboard(prices, returns, weights)
dashboard.show()

Exercise 12.3: Risk Dashboard (Guided)

Your Task: Create a risk monitoring dashboard with VaR, drawdown, and volatility panels.

Fill in the blanks to complete the dashboard:

Exercise
Click to reveal solution
def create_risk_dashboard(returns: pd.Series, window: int = 21, 
                          confidence: float = 0.95) -> go.Figure:
    """
    Create a risk monitoring dashboard.
    """
    # Calculate rolling volatility (annualized)
    rolling_vol = returns.rolling(window).std() * np.sqrt(252)

    # Calculate rolling VaR (using quantile)
    rolling_var = returns.rolling(window).quantile(1 - confidence) * -1

    # Calculate drawdown
    cum_returns = (1 + returns).cumprod()
    running_max = cum_returns.cummax()
    drawdown = (cum_returns - running_max) / running_max

    # Create 3-panel layout
    fig = make_subplots(
        rows=3, cols=1,
        shared_xaxes=True,
        subplot_titles=(
            f'{window}-Day Rolling VaR ({confidence*100:.0f}%)',
            'Drawdown',
            f'{window}-Day Rolling Volatility'
        )
    )

    fig.add_trace(
        go.Scatter(x=rolling_var.index, y=rolling_var * 100,
                  fill='tozeroy', fillcolor='rgba(255,100,100,0.3)',
                  line=dict(color='red', width=1)),
        row=1, col=1
    )

    fig.add_trace(
        go.Scatter(x=drawdown.index, y=drawdown * 100,
                  fill='tozeroy', fillcolor='rgba(255,0,0,0.3)',
                  line=dict(color='darkred', width=1)),
        row=2, col=1
    )

    fig.add_trace(
        go.Scatter(x=rolling_vol.index, y=rolling_vol * 100,
                  fill='tozeroy', fillcolor='rgba(255,165,0,0.3)',
                  line=dict(color='orange', width=1)),
        row=3, col=1
    )

    fig.update_layout(height=700, showlegend=False, template='plotly_white',
                     title='Risk Monitoring Dashboard')
    return fig

# Test
port_returns = (returns * np.array([0.4, 0.25, 0.25, 0.1])).sum(axis=1)
risk_dash = create_risk_dashboard(port_returns)
risk_dash.show()

Section 12.4: Dash Architecture

Dash enables full interactivity through callbacks - functions that update outputs when inputs change.

In this section, you will learn: - Dash application structure - Callbacks for interactivity - Input/Output components

12.4.1 Dash Application Structure

from dash import Dash, html, dcc, Input, Output

app = Dash(__name__)

app.layout = html.Div([
    # Input components
    dcc.Dropdown(id='asset-selector', options=[...]),

    # Output components
    dcc.Graph(id='performance-chart')
])

@app.callback(
    Output('performance-chart', 'figure'),
    Input('asset-selector', 'value')
)
def update_chart(selected_asset):
    # Create and return figure
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)
# Simulating interactive behavior in Jupyter
def create_interactive_chart(assets_to_show: list, lookback_days: int) -> go.Figure:
    """
    Create a chart that would be updated by Dash callbacks.
    """
    filtered_prices = prices[assets_to_show].iloc[-lookback_days:]
    normalized = filtered_prices / filtered_prices.iloc[0] * 100
    
    fig = go.Figure()
    
    for col in normalized.columns:
        fig.add_trace(go.Scatter(
            x=normalized.index,
            y=normalized[col],
            name=col,
            mode='lines'
        ))
    
    fig.update_layout(
        title=f'Performance Over Last {lookback_days} Days',
        xaxis_title='Date',
        yaxis_title='Normalized Value',
        template='plotly_white',
        hovermode='x unified'
    )
    
    return fig

# Simulate different states
print("Simulating: All assets, 252 days")
fig1 = create_interactive_chart(['SPY', 'QQQ', 'TLT', 'GLD'], 252)
fig1.show()

Exercise 12.4: Monthly Returns Heatmap (Open-ended)

Your Task:

Build a function that: - Creates a calendar heatmap of monthly returns - Shows months as columns and years as rows - Uses diverging color scale centered on zero - Includes hover information

Your implementation:

Exercise
Click to reveal solution
def create_monthly_returns_heatmap(returns: pd.Series, title: str = 'Monthly Returns') -> go.Figure:
    """
    Create a calendar heatmap of monthly returns.
    """
    # Calculate monthly returns
    monthly = returns.resample('M').apply(lambda x: (1 + x).prod() - 1)

    # Create pivot table
    monthly_df = pd.DataFrame({
        'Year': monthly.index.year,
        'Month': monthly.index.month,
        'Return': monthly.values
    })

    pivot = monthly_df.pivot(index='Year', columns='Month', values='Return')

    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

    fig = go.Figure(data=go.Heatmap(
        z=pivot.values * 100,
        x=months[:pivot.shape[1]],
        y=pivot.index,
        colorscale='RdYlGn',
        zmid=0,
        text=np.round(pivot.values * 100, 1),
        texttemplate='%{text}%',
        hovertemplate='%{y} %{x}<br>Return: %{z:.1f}%<extra></extra>',
        colorbar=dict(title='Return (%)')
    ))

    fig.update_layout(
        title=title,
        template='plotly_white',
        height=300
    )

    return fig

# Test
port_returns = (returns * np.array([0.4, 0.25, 0.25, 0.1])).sum(axis=1)
heatmap = create_monthly_returns_heatmap(port_returns, 'Portfolio Monthly Returns')
heatmap.show()

Exercise 12.5: Performance Comparison Chart (Open-ended)

Your Task:

Build a function that: - Compares multiple portfolios against a benchmark - Shows cumulative returns over time - Includes a summary table with key metrics - Uses appropriate color coding

Your implementation:

Exercise
Click to reveal solution
def create_performance_comparison(returns: pd.DataFrame, portfolios: dict,
                                  benchmark: str = 'SPY') -> go.Figure:
    """
    Create a performance comparison chart.

    Args:
        portfolios: Dict of {name: {ticker: weight}}
        benchmark: Benchmark ticker
    """
    # Calculate portfolio returns
    port_returns = {}
    for name, weights in portfolios.items():
        w = np.array([weights.get(t, 0) for t in returns.columns])
        port_returns[name] = (returns * w).sum(axis=1)

    # Add benchmark
    port_returns['Benchmark'] = returns[benchmark]

    # Calculate cumulative returns
    cum_returns = {name: (1 + ret).cumprod() for name, ret in port_returns.items()}

    # Create figure
    colors = ['steelblue', 'orange', 'green', 'red', 'purple', 'gray']
    fig = go.Figure()

    for i, (name, cum_ret) in enumerate(cum_returns.items()):
        line_style = dict(dash='dash') if name == 'Benchmark' else dict()
        fig.add_trace(go.Scatter(
            x=cum_ret.index,
            y=(cum_ret - 1) * 100,
            name=name,
            mode='lines',
            line=dict(color=colors[i % len(colors)], **line_style)
        ))

    # Add metrics annotation
    metrics_text = "<b>Annualized Metrics:</b><br>"
    for name, ret in port_returns.items():
        ann_ret = ret.mean() * 252 * 100
        sharpe = (ret.mean() * 252) / (ret.std() * np.sqrt(252))
        metrics_text += f"{name}: {ann_ret:.1f}% (SR: {sharpe:.2f})<br>"

    fig.add_annotation(
        x=0.02, y=0.98, xref='paper', yref='paper',
        text=metrics_text, showarrow=False,
        font=dict(size=10), align='left',
        bgcolor='white', bordercolor='gray', borderwidth=1
    )

    fig.update_layout(
        title='Portfolio Performance Comparison',
        xaxis_title='Date',
        yaxis_title='Cumulative Return (%)',
        template='plotly_white',
        hovermode='x unified',
        legend=dict(orientation='h', y=1.1)
    )

    return fig

# Test
portfolios = {
    'Aggressive': {'SPY': 0.7, 'QQQ': 0.3, 'TLT': 0.0, 'GLD': 0.0},
    'Balanced': {'SPY': 0.4, 'QQQ': 0.2, 'TLT': 0.3, 'GLD': 0.1},
    'Defensive': {'SPY': 0.2, 'QQQ': 0.0, 'TLT': 0.5, 'GLD': 0.3}
}
comparison = create_performance_comparison(returns, portfolios)
comparison.show()

Exercise 12.6: Complete Dashboard Class (Open-ended)

Your Task:

Build a comprehensive dashboard class that includes: - Performance visualization methods - Risk charts (drawdown, VaR, volatility) - Allocation views (pie, bar) - Full dashboard layout generation - Dark/light theme support

Your implementation:

Exercise
Click to reveal solution
class QuantDashboard:
    """
    Complete quantitative finance dashboard builder.
    """

    def __init__(self, returns: pd.DataFrame, prices: pd.DataFrame = None,
                 benchmark: str = None):
        self.returns = returns
        self.prices = prices
        self.benchmark = benchmark
        self.template = 'plotly_white'

    def set_dark_theme(self):
        """Switch to dark theme."""
        self.template = 'plotly_dark'

    def performance_chart(self, normalize: bool = True) -> go.Figure:
        """Create performance time series chart."""
        if self.prices is not None:
            data = self.prices
        else:
            data = (1 + self.returns).cumprod()

        if normalize:
            data = data / data.iloc[0] * 100

        fig = go.Figure()
        for col in data.columns:
            fig.add_trace(go.Scatter(x=data.index, y=data[col], name=col, mode='lines'))

        fig.update_layout(
            title='Performance Over Time',
            xaxis_title='Date',
            yaxis_title='Normalized Value' if normalize else 'Value',
            template=self.template,
            hovermode='x unified'
        )
        return fig

    def drawdown_chart(self) -> go.Figure:
        """Create drawdown chart."""
        if isinstance(self.returns, pd.DataFrame):
            ret = self.returns.mean(axis=1)
        else:
            ret = self.returns

        cum = (1 + ret).cumprod()
        running_max = cum.cummax()
        drawdown = (cum - running_max) / running_max

        fig = go.Figure(go.Scatter(
            x=drawdown.index, y=drawdown * 100,
            fill='tozeroy', fillcolor='rgba(255,0,0,0.3)',
            line=dict(color='red', width=1)
        ))

        fig.update_layout(
            title='Drawdown',
            xaxis_title='Date',
            yaxis_title='Drawdown (%)',
            template=self.template
        )
        return fig

    def correlation_heatmap(self) -> go.Figure:
        """Create correlation heatmap."""
        corr = self.returns.corr()

        fig = go.Figure(go.Heatmap(
            z=corr.values, x=corr.columns, y=corr.index,
            colorscale='RdYlGn', zmid=0,
            text=np.round(corr.values, 2), texttemplate='%{text}'
        ))

        fig.update_layout(title='Correlation Matrix', template=self.template)
        return fig

    def risk_return_scatter(self) -> go.Figure:
        """Create risk-return scatter plot."""
        ann_ret = self.returns.mean() * 252
        ann_vol = self.returns.std() * np.sqrt(252)
        sharpe = ann_ret / ann_vol

        fig = go.Figure(go.Scatter(
            x=ann_vol * 100, y=ann_ret * 100,
            mode='markers+text',
            text=self.returns.columns,
            textposition='top center',
            marker=dict(size=sharpe * 20 + 15, color=sharpe, colorscale='RdYlGn')
        ))

        fig.update_layout(
            title='Risk-Return Profile',
            xaxis_title='Volatility (%)',
            yaxis_title='Return (%)',
            template=self.template
        )
        return fig

    def full_dashboard(self) -> go.Figure:
        """Create comprehensive dashboard with all components."""
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=('Performance', 'Risk-Return', 'Drawdown', 'Correlation'),
            specs=[
                [{'type': 'scatter'}, {'type': 'scatter'}],
                [{'type': 'scatter'}, {'type': 'heatmap'}]
            ]
        )

        # Get individual figures
        perf = self.performance_chart()
        rr = self.risk_return_scatter()
        dd = self.drawdown_chart()
        corr = self.correlation_heatmap()

        # Add traces
        for trace in perf.data:
            fig.add_trace(trace, row=1, col=1)
        for trace in rr.data:
            fig.add_trace(trace, row=1, col=2)
        for trace in dd.data:
            fig.add_trace(trace, row=2, col=1)
        for trace in corr.data:
            fig.add_trace(trace, row=2, col=2)

        fig.update_layout(
            height=800, showlegend=False, template=self.template,
            title='Quantitative Finance Dashboard'
        )

        return fig

# Demo
dashboard = QuantDashboard(returns, prices)
full_dash = dashboard.full_dashboard()
full_dash.show()

Module Project: Portfolio Analytics Dashboard

Put together everything you've learned!

Your Challenge:

Build a complete portfolio analytics dashboard that includes: 1. KPI cards showing key metrics (return, volatility, Sharpe) 2. Performance chart with benchmark comparison 3. Asset allocation visualization (pie chart) 4. Risk metrics panel (drawdown, VaR, volatility) 5. Monthly returns heatmap 6. Correlation matrix

# YOUR CODE HERE - Module Project
Click to reveal solution
class PortfolioAnalyticsDashboard:
    """
    Complete portfolio analytics dashboard.
    """

    def __init__(self, returns: pd.DataFrame, weights: dict, benchmark: str = 'SPY'):
        self.returns = returns
        self.weights = weights
        self.benchmark = benchmark
        self.template = 'plotly_white'

        # Calculate portfolio returns
        w = np.array([weights.get(t, 0) for t in returns.columns])
        self.port_returns = (returns * w).sum(axis=1)
        self.bench_returns = returns[benchmark]

    def _calculate_metrics(self) -> dict:
        """Calculate key portfolio metrics."""
        return {
            'total_return': (1 + self.port_returns).prod() - 1,
            'ann_return': self.port_returns.mean() * 252,
            'ann_vol': self.port_returns.std() * np.sqrt(252),
            'sharpe': (self.port_returns.mean() * 252) / (self.port_returns.std() * np.sqrt(252)),
            'max_dd': ((1 + self.port_returns).cumprod() / 
                      (1 + self.port_returns).cumprod().cummax() - 1).min()
        }

    def generate_dashboard(self) -> go.Figure:
        """Generate complete dashboard."""
        metrics = self._calculate_metrics()

        fig = make_subplots(
            rows=3, cols=3,
            subplot_titles=[
                'Total Return', 'Volatility', 'Sharpe Ratio',
                'Performance vs Benchmark', 'Asset Allocation', 'Drawdown',
                'Monthly Returns', 'Correlation Matrix', 'Risk-Return'
            ],
            specs=[
                [{'type': 'indicator'}, {'type': 'indicator'}, {'type': 'indicator'}],
                [{'type': 'scatter'}, {'type': 'pie'}, {'type': 'scatter'}],
                [{'type': 'heatmap'}, {'type': 'heatmap'}, {'type': 'scatter'}]
            ],
            vertical_spacing=0.1,
            horizontal_spacing=0.1
        )

        # KPI Cards
        fig.add_trace(go.Indicator(
            mode='number', value=metrics['total_return'] * 100,
            number={'suffix': '%', 'font': {'size': 24}}
        ), row=1, col=1)

        fig.add_trace(go.Indicator(
            mode='number', value=metrics['ann_vol'] * 100,
            number={'suffix': '%', 'font': {'size': 24}}
        ), row=1, col=2)

        fig.add_trace(go.Indicator(
            mode='number', value=metrics['sharpe'],
            number={'font': {'size': 24}}
        ), row=1, col=3)

        # Performance
        port_cum = (1 + self.port_returns).cumprod()
        bench_cum = (1 + self.bench_returns).cumprod()
        fig.add_trace(go.Scatter(x=port_cum.index, y=port_cum, name='Portfolio',
                                line=dict(color='steelblue')), row=2, col=1)
        fig.add_trace(go.Scatter(x=bench_cum.index, y=bench_cum, name='Benchmark',
                                line=dict(color='gray', dash='dash')), row=2, col=1)

        # Allocation
        active_weights = {k: v for k, v in self.weights.items() if v > 0}
        fig.add_trace(go.Pie(labels=list(active_weights.keys()), 
                            values=list(active_weights.values()), hole=0.4), row=2, col=2)

        # Drawdown
        drawdown = (port_cum / port_cum.cummax() - 1) * 100
        fig.add_trace(go.Scatter(x=drawdown.index, y=drawdown,
                                fill='tozeroy', fillcolor='rgba(255,0,0,0.3)',
                                line=dict(color='red')), row=2, col=3)

        # Monthly returns heatmap
        monthly = self.port_returns.resample('M').apply(lambda x: (1 + x).prod() - 1)
        monthly_df = pd.DataFrame({'Year': monthly.index.year, 'Month': monthly.index.month, 
                                   'Return': monthly.values})
        pivot = monthly_df.pivot(index='Year', columns='Month', values='Return')
        fig.add_trace(go.Heatmap(z=pivot.values * 100, x=list(range(1, 13)), y=pivot.index,
                                colorscale='RdYlGn', zmid=0), row=3, col=1)

        # Correlation
        corr = self.returns.corr()
        fig.add_trace(go.Heatmap(z=corr.values, x=corr.columns, y=corr.index,
                                colorscale='RdYlGn', zmid=0), row=3, col=2)

        # Risk-Return
        ann_ret = self.returns.mean() * 252
        ann_vol = self.returns.std() * np.sqrt(252)
        fig.add_trace(go.Scatter(
            x=ann_vol * 100, y=ann_ret * 100,
            mode='markers+text', text=self.returns.columns,
            textposition='top center',
            marker=dict(size=15, color=ann_ret / ann_vol, colorscale='RdYlGn')
        ), row=3, col=3)

        fig.update_layout(
            height=1000, template=self.template, showlegend=False,
            title=f'Portfolio Analytics Dashboard | Return: {metrics["ann_return"]*100:.1f}% | Sharpe: {metrics["sharpe"]:.2f}'
        )

        return fig

# Demo
weights = {'SPY': 0.4, 'QQQ': 0.25, 'TLT': 0.25, 'GLD': 0.1}
dashboard = PortfolioAnalyticsDashboard(returns, weights)
full_dash = dashboard.generate_dashboard()
full_dash.show()

Key Takeaways

What You Learned

1. Dashboard Design

  • Follow information hierarchy: summary -> trends -> details
  • Use consistent color conventions (green=good, red=bad)
  • Keep it simple and actionable

2. Plotly Fundamentals

  • go.Figure() for custom charts
  • make_subplots() for multi-panel layouts
  • Templates for consistent styling

3. Financial Charts

  • Candlestick charts for price data
  • Heatmaps for correlations and calendar views
  • Scatter plots for risk-return analysis

4. Dash Architecture

  • Callbacks connect inputs to outputs
  • Layout defines component structure
  • Full interactivity without page refresh

Coming Up Next

In Module 13: Professional Reporting, we'll explore: - Automated report generation - PDF and Excel output - Performance tear sheets - Scheduled report delivery


Congratulations on completing Module 12!

Module 13: Professional Reporting

Course 3: Quantitative Finance & Portfolio Theory
Part 5: Production & Infrastructure


Learning Objectives

By the end of this module, you will be able to:

  1. Design professional financial reports for different audiences
  2. Automate PDF report generation with ReportLab
  3. Create formatted Excel workbooks with openpyxl
  4. Build performance tear sheets for strategy evaluation
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 11, 12

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy import stats
import io
import os
import warnings
warnings.filterwarnings('ignore')

# PDF generation
try:
    from reportlab.lib import colors
    from reportlab.lib.pagesizes import letter, A4
    from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer, Image
    from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
    from reportlab.lib.units import inch
    PDF_AVAILABLE = True
except ImportError:
    PDF_AVAILABLE = False
    print("ReportLab not installed. PDF generation will be simulated.")

# Excel generation
try:
    import openpyxl
    from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
    from openpyxl.chart import LineChart, Reference, BarChart
    EXCEL_AVAILABLE = True
except ImportError:
    EXCEL_AVAILABLE = False
    print("openpyxl not installed. Excel generation will be simulated.")

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Module 13: Professional Reporting - Ready!')

Load Data

# Download data
tickers = ['SPY', 'QQQ', 'TLT', 'GLD']
data = yf.download(tickers, start='2020-01-01', end='2024-01-01', progress=False)

if isinstance(data.columns, pd.MultiIndex):
    prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
else:
    prices = data

returns = prices.pct_change().dropna()

# Create sample portfolio
weights = {'SPY': 0.4, 'QQQ': 0.25, 'TLT': 0.25, 'GLD': 0.1}
w = np.array([weights[t] for t in tickers])
portfolio_returns = (returns[tickers] * w).sum(axis=1)
benchmark_returns = returns['SPY']

print(f'Data loaded: {len(returns)} trading days')

Section 13.1: Report Design Principles

Professional portfolio management requires clear, consistent reporting. Different audiences need different information at varying levels of detail.

In this section, you will learn: - Types of financial reports - Report structure best practices - Calculating report metrics

13.1.1 Types of Financial Reports

Report Type Audience Frequency Content
Client Report External Monthly/Quarterly Performance, holdings, commentary
Risk Report Internal Daily VaR, limits, breaches
Regulatory Report Regulators Quarterly/Annual Compliance, positions
Tear Sheet Marketing On-demand Strategy summary

13.1.2 Report Structure Best Practices

  1. Executive Summary: Key numbers at a glance
  2. Performance Section: Returns, attribution, benchmarks
  3. Risk Section: Volatility, VaR, drawdowns
  4. Holdings Section: Current positions, weights
  5. Appendix: Methodology, disclaimers

13.1.3 Calculating Report Metrics

def calculate_report_metrics(returns: pd.Series, benchmark_returns: pd.Series, 
                            risk_free_rate: float = 0.0) -> dict:
    """
    Calculate all metrics needed for a performance report.
    
    Parameters:
    -----------
    returns : Series
        Portfolio returns
    benchmark_returns : Series
        Benchmark returns
    risk_free_rate : float
        Annual risk-free rate
    
    Returns:
    --------
    dict : Report metrics
    """
    # Cumulative returns
    cum_port = (1 + returns).cumprod()
    cum_bench = (1 + benchmark_returns).cumprod()
    
    # Period returns
    total_return = cum_port.iloc[-1] - 1
    bench_return = cum_bench.iloc[-1] - 1
    
    # Annualized metrics
    n_years = len(returns) / 252
    ann_return = (1 + total_return) ** (1/n_years) - 1
    ann_vol = returns.std() * np.sqrt(252)
    
    # Risk-adjusted metrics
    sharpe = (ann_return - risk_free_rate) / ann_vol
    
    # Sortino
    downside_returns = returns[returns < 0]
    downside_vol = downside_returns.std() * np.sqrt(252)
    sortino = (ann_return - risk_free_rate) / downside_vol
    
    # Drawdown
    running_max = cum_port.cummax()
    drawdown = (cum_port - running_max) / running_max
    max_dd = drawdown.min()
    
    # Calmar
    calmar = ann_return / abs(max_dd) if max_dd != 0 else 0
    
    # Relative metrics
    active_returns = returns - benchmark_returns
    tracking_error = active_returns.std() * np.sqrt(252)
    information_ratio = (active_returns.mean() * 252) / tracking_error
    
    # Beta and Alpha
    cov = np.cov(returns, benchmark_returns)[0, 1]
    var_bench = benchmark_returns.var()
    beta = cov / var_bench
    alpha = ann_return - beta * (benchmark_returns.mean() * 252)
    
    # VaR and ES
    var_95 = -np.percentile(returns, 5)
    es_95 = -returns[returns <= np.percentile(returns, 5)].mean()
    
    # Win rate
    win_rate = (returns > 0).mean()
    
    return {
        'total_return': total_return,
        'benchmark_return': bench_return,
        'active_return': total_return - bench_return,
        'ann_return': ann_return,
        'ann_volatility': ann_vol,
        'sharpe_ratio': sharpe,
        'sortino_ratio': sortino,
        'calmar_ratio': calmar,
        'max_drawdown': max_dd,
        'tracking_error': tracking_error,
        'information_ratio': information_ratio,
        'beta': beta,
        'alpha': alpha,
        'var_95': var_95,
        'es_95': es_95,
        'win_rate': win_rate,
        'n_periods': len(returns),
        'start_date': returns.index[0],
        'end_date': returns.index[-1]
    }

# Calculate metrics
metrics = calculate_report_metrics(portfolio_returns, benchmark_returns)

print("Portfolio Performance Metrics")
print("=" * 50)
print(f"\nPeriod: {metrics['start_date'].strftime('%Y-%m-%d')} to {metrics['end_date'].strftime('%Y-%m-%d')}")
print(f"\nReturn Metrics:")
print(f"  Total Return:     {metrics['total_return']*100:.2f}%")
print(f"  Benchmark Return: {metrics['benchmark_return']*100:.2f}%")
print(f"  Annualized:       {metrics['ann_return']*100:.2f}%")

Exercise 13.1: Report Metrics Calculator (Guided)

Your Task: Complete the function to calculate rolling performance metrics for a report.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def calculate_rolling_metrics(returns: pd.Series, window: int = 252) -> pd.DataFrame:
    """
    Calculate rolling performance metrics for reporting.
    """
    # Calculate rolling mean return (annualized)
    rolling_return = returns.rolling(window).mean() * 252

    # Calculate rolling volatility (annualized)
    rolling_vol = returns.rolling(window).std() * np.sqrt(252)

    # Calculate rolling Sharpe ratio
    rolling_sharpe = rolling_return / rolling_vol

    return pd.DataFrame({
        'return': rolling_return,
        'volatility': rolling_vol,
        'sharpe': rolling_sharpe
    })

# Test
rolling_metrics = calculate_rolling_metrics(portfolio_returns)
print(rolling_metrics.tail())

Section 13.2: Automated PDF Reports

PDF reports provide professional, print-ready documents for client communications.

In this section, you will learn: - ReportLab basics - Creating tables and charts for PDFs - Building multi-page reports

13.2.1 Creating Charts for Reports

def create_performance_chart(returns: pd.Series, benchmark_returns: pd.Series, 
                             filename: str = 'performance.png') -> str:
    """
    Create a performance chart for the report.
    """
    cum_port = (1 + returns).cumprod() * 100
    cum_bench = (1 + benchmark_returns).cumprod() * 100
    
    fig, ax = plt.subplots(figsize=(8, 4))
    
    ax.plot(cum_port.index, cum_port, label='Portfolio', linewidth=2, color='#2E86AB')
    ax.plot(cum_bench.index, cum_bench, label='Benchmark', linewidth=1.5, 
            linestyle='--', color='gray')
    
    ax.set_xlabel('Date')
    ax.set_ylabel('Growth of $100')
    ax.set_title('Portfolio Performance vs Benchmark')
    ax.legend(loc='upper left')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(filename, dpi=150, bbox_inches='tight')
    plt.close()
    
    return filename


def create_drawdown_chart(returns: pd.Series, filename: str = 'drawdown.png') -> str:
    """
    Create a drawdown chart for the report.
    """
    cum_returns = (1 + returns).cumprod()
    running_max = cum_returns.cummax()
    drawdown = (cum_returns - running_max) / running_max
    
    fig, ax = plt.subplots(figsize=(8, 3))
    
    ax.fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
    ax.plot(drawdown.index, drawdown * 100, color='darkred', linewidth=1)
    
    ax.set_xlabel('Date')
    ax.set_ylabel('Drawdown (%)')
    ax.set_title('Portfolio Drawdown')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(filename, dpi=150, bbox_inches='tight')
    plt.close()
    
    return filename

# Create charts
perf_chart = create_performance_chart(portfolio_returns, benchmark_returns)
dd_chart = create_drawdown_chart(portfolio_returns)
print(f"Charts created: {perf_chart}, {dd_chart}")

13.2.2 PDF Report Generation

def generate_pdf_report(metrics: dict, weights: dict, 
                        filename: str = 'portfolio_report.pdf') -> str:
    """
    Generate a professional PDF report.
    """
    if not PDF_AVAILABLE:
        print("PDF generation simulated (ReportLab not installed)")
        return None
    
    doc = SimpleDocTemplate(filename, pagesize=letter)
    styles = getSampleStyleSheet()
    story = []
    
    # Title
    title_style = ParagraphStyle(
        'CustomTitle',
        parent=styles['Heading1'],
        fontSize=24,
        spaceAfter=30,
        alignment=1
    )
    story.append(Paragraph('Portfolio Performance Report', title_style))
    
    # Report date
    report_date = datetime.now().strftime('%B %d, %Y')
    date_style = ParagraphStyle('DateStyle', parent=styles['Normal'], 
                                fontSize=12, alignment=1)
    story.append(Paragraph(f'Report Date: {report_date}', date_style))
    story.append(Spacer(1, 20))
    
    # Summary table
    story.append(Paragraph('Executive Summary', styles['Heading2']))
    story.append(Spacer(1, 10))
    
    summary_data = [
        ['Metric', 'Value'],
        ['Total Return', f"{metrics['total_return']*100:.2f}%"],
        ['Benchmark Return', f"{metrics['benchmark_return']*100:.2f}%"],
        ['Sharpe Ratio', f"{metrics['sharpe_ratio']:.2f}"],
        ['Max Drawdown', f"{metrics['max_drawdown']*100:.2f}%"],
    ]
    
    summary_table = Table(summary_data, colWidths=[2.5*inch, 2*inch])
    summary_table.setStyle(TableStyle([
        ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#2E86AB')),
        ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
        ('ALIGN', (0, 0), (-1, -1), 'CENTER'),
        ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
        ('GRID', (0, 0), (-1, -1), 1, colors.black),
    ]))
    story.append(summary_table)
    
    # Build PDF
    doc.build(story)
    return filename

if PDF_AVAILABLE:
    pdf_file = generate_pdf_report(metrics, weights)
    print(f"PDF report generated: {pdf_file}")
else:
    print("PDF generation skipped (install reportlab to enable)")

Exercise 13.2: Monthly Summary Table (Guided)

Your Task: Complete the function to create a monthly returns summary table for PDF reports.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def create_monthly_summary(returns: pd.Series) -> pd.DataFrame:
    """
    Create monthly summary statistics for report table.
    """
    # Resample returns to monthly frequency and calculate compound return
    monthly_return = returns.resample('M').apply(lambda x: (1+x).prod() - 1)

    # Calculate monthly volatility (annualized)
    monthly_vol = returns.resample('M').std() * np.sqrt(252)

    # Find best day each month
    best_day = returns.resample('M').max()

    # Find worst day each month
    worst_day = returns.resample('M').min()

    return pd.DataFrame({
        'Return': monthly_return,
        'Volatility': monthly_vol,
        'Best Day': best_day,
        'Worst Day': worst_day
    })

# Test
summary = create_monthly_summary(portfolio_returns)
print(summary.tail())

Section 13.3: Excel Reports

Excel workbooks allow recipients to explore data interactively.

In this section, you will learn: - Creating formatted Excel workbooks - Multi-sheet reports - Conditional formatting

13.3.1 Excel Report Generation

def generate_excel_report(returns: pd.Series, benchmark_returns: pd.Series, 
                          metrics: dict, weights: dict,
                          filename: str = 'portfolio_report.xlsx') -> str:
    """
    Generate a comprehensive Excel report.
    """
    if not EXCEL_AVAILABLE:
        print("Excel generation simulated (openpyxl not installed)")
        return None
    
    wb = openpyxl.Workbook()
    
    # Define styles
    header_font = Font(bold=True, color='FFFFFF', size=12)
    header_fill = PatternFill(start_color='2E86AB', end_color='2E86AB', fill_type='solid')
    percent_format = '0.00%'
    
    # Summary Sheet
    ws_summary = wb.active
    ws_summary.title = 'Summary'
    
    ws_summary['A1'] = 'Portfolio Performance Report'
    ws_summary['A1'].font = Font(bold=True, size=16)
    ws_summary['A2'] = f'Report Date: {datetime.now().strftime("%Y-%m-%d")}'
    
    # Metrics
    metrics_data = [
        ('Metric', 'Value'),
        ('Total Return', metrics['total_return']),
        ('Benchmark Return', metrics['benchmark_return']),
        ('Sharpe Ratio', metrics['sharpe_ratio']),
        ('Max Drawdown', metrics['max_drawdown']),
    ]
    
    for i, (metric, value) in enumerate(metrics_data):
        row = 4 + i
        ws_summary[f'A{row}'] = metric
        ws_summary[f'B{row}'] = value
        if i == 0:
            ws_summary[f'A{row}'].font = header_font
            ws_summary[f'A{row}'].fill = header_fill
            ws_summary[f'B{row}'].font = header_font
            ws_summary[f'B{row}'].fill = header_fill
        elif metric != 'Sharpe Ratio':
            ws_summary[f'B{row}'].number_format = percent_format
    
    ws_summary.column_dimensions['A'].width = 20
    ws_summary.column_dimensions['B'].width = 15
    
    # Monthly Returns Sheet with conditional formatting
    ws_monthly = wb.create_sheet('Monthly Returns')
    monthly = returns.resample('M').apply(lambda x: (1+x).prod() - 1)
    
    ws_monthly['A1'] = 'Date'
    ws_monthly['B1'] = 'Return'
    ws_monthly['A1'].font = header_font
    ws_monthly['A1'].fill = header_fill
    ws_monthly['B1'].font = header_font
    ws_monthly['B1'].fill = header_fill
    
    for i, (date, ret) in enumerate(monthly.items()):
        row = 2 + i
        ws_monthly[f'A{row}'] = date.strftime('%Y-%m')
        ws_monthly[f'B{row}'] = ret
        ws_monthly[f'B{row}'].number_format = percent_format
        
        # Color code
        if ret > 0:
            ws_monthly[f'B{row}'].fill = PatternFill(
                start_color='C6EFCE', end_color='C6EFCE', fill_type='solid')
        else:
            ws_monthly[f'B{row}'].fill = PatternFill(
                start_color='FFC7CE', end_color='FFC7CE', fill_type='solid')
    
    wb.save(filename)
    return filename

if EXCEL_AVAILABLE:
    excel_file = generate_excel_report(portfolio_returns, benchmark_returns, metrics, weights)
    print(f"Excel report generated: {excel_file}")
else:
    print("Excel generation skipped (install openpyxl to enable)")

Exercise 13.3: Holdings Sheet Creator (Guided)

Your Task: Complete the function to create a formatted holdings sheet for an Excel report.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def create_holdings_dataframe(weights: dict, prices: pd.DataFrame) -> pd.DataFrame:
    """
    Create a holdings DataFrame with current values and metrics.
    """
    holdings = []

    for symbol, weight in weights.items():
        # Get the last price for this symbol
        last_price = prices[symbol].iloc[-1]

        # Calculate 1-day return
        daily_return = prices[symbol].pct_change().iloc[-1]

        # Calculate YTD return
        year_start = prices[prices.index.year == prices.index[-1].year][symbol].iloc[0]
        ytd_return = (last_price / year_start) - 1

        holdings.append({
            'Symbol': symbol,
            'Weight': weight,
            'Price': last_price,
            '1D Return': daily_return,
            'YTD Return': ytd_return
        })

    return pd.DataFrame(holdings)

# Test
holdings_df = create_holdings_dataframe(weights, prices)
print(holdings_df)

Exercise 13.4: Custom Report Builder (Open-ended)

Your Task:

Build a function that creates a DataFrame containing a quarterly performance summary: - Quarterly returns (compound daily returns) - Quarterly volatility (annualized) - Quarterly Sharpe ratio - Best and worst months within each quarter

Your implementation:

Exercise
Click to reveal solution
def create_quarterly_summary(returns: pd.Series) -> pd.DataFrame:
    """
    Create quarterly performance summary.
    """
    # Quarterly returns
    quarterly_return = returns.resample('Q').apply(lambda x: (1+x).prod() - 1)

    # Quarterly volatility (annualized)
    quarterly_vol = returns.resample('Q').std() * np.sqrt(252)

    # Quarterly Sharpe
    quarterly_sharpe = (returns.resample('Q').mean() * 252) / quarterly_vol

    # Monthly returns for best/worst
    monthly = returns.resample('M').apply(lambda x: (1+x).prod() - 1)

    # Best/worst months per quarter
    best_months = []
    worst_months = []

    for q_end in quarterly_return.index:
        q_start = q_end - pd.offsets.QuarterEnd(1) + pd.offsets.Day(1)
        q_months = monthly[(monthly.index >= q_start) & (monthly.index <= q_end)]
        if len(q_months) > 0:
            best_months.append(q_months.max())
            worst_months.append(q_months.min())
        else:
            best_months.append(np.nan)
            worst_months.append(np.nan)

    return pd.DataFrame({
        'Quarterly Return': quarterly_return,
        'Volatility': quarterly_vol,
        'Sharpe': quarterly_sharpe,
        'Best Month': best_months,
        'Worst Month': worst_months
    })

# Test
quarterly = create_quarterly_summary(portfolio_returns)
print(quarterly)

Section 13.4: Performance Tear Sheets

Tear sheets provide a one-page summary of strategy performance for quick evaluation.

In this section, you will learn: - Tear sheet design principles - Multi-panel layouts - Key visualizations

13.4.1 Creating Professional Tear Sheets

def create_tear_sheet(returns: pd.Series, benchmark_returns: pd.Series, 
                      weights: dict, strategy_name: str = 'Portfolio'):
    """
    Create a professional performance tear sheet.
    """
    # Calculate metrics
    metrics = calculate_report_metrics(returns, benchmark_returns)
    
    # Create figure
    fig = plt.figure(figsize=(12, 14))
    fig.suptitle(f'{strategy_name} Performance Tear Sheet', fontsize=16, fontweight='bold', y=0.98)
    
    gs = fig.add_gridspec(4, 2, height_ratios=[0.5, 1, 1, 1], hspace=0.3, wspace=0.3)
    
    # Row 1: Key Metrics
    ax_metrics = fig.add_subplot(gs[0, :])
    ax_metrics.axis('off')
    
    metrics_text = (
        f"Total Return: {metrics['total_return']*100:.2f}%  |  "
        f"Ann. Return: {metrics['ann_return']*100:.2f}%  |  "
        f"Volatility: {metrics['ann_volatility']*100:.2f}%  |  "
        f"Sharpe: {metrics['sharpe_ratio']:.2f}  |  "
        f"Max DD: {metrics['max_drawdown']*100:.2f}%"
    )
    ax_metrics.text(0.5, 0.5, metrics_text, transform=ax_metrics.transAxes,
                   ha='center', va='center', fontsize=11,
                   bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.3))
    
    # Row 2: Cumulative Returns
    ax_cum = fig.add_subplot(gs[1, :])
    cum_port = (1 + returns).cumprod()
    cum_bench = (1 + benchmark_returns).cumprod()
    
    ax_cum.plot(cum_port.index, cum_port, label='Portfolio', linewidth=2, color='#2E86AB')
    ax_cum.plot(cum_bench.index, cum_bench, label='Benchmark', linewidth=1.5, 
               linestyle='--', color='gray')
    ax_cum.set_ylabel('Cumulative Return')
    ax_cum.set_title('Cumulative Returns')
    ax_cum.legend(loc='upper left')
    ax_cum.grid(True, alpha=0.3)
    
    # Row 3: Drawdown
    ax_dd = fig.add_subplot(gs[2, :])
    running_max = cum_port.cummax()
    drawdown = (cum_port - running_max) / running_max
    
    ax_dd.fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
    ax_dd.plot(drawdown.index, drawdown * 100, color='darkred', linewidth=1)
    ax_dd.set_ylabel('Drawdown (%)')
    ax_dd.set_title('Drawdown')
    ax_dd.grid(True, alpha=0.3)
    
    # Row 4: Distribution and Allocation
    ax_dist = fig.add_subplot(gs[3, 0])
    ax_dist.hist(returns * 100, bins=50, alpha=0.7, color='steelblue', edgecolor='white')
    ax_dist.axvline(returns.mean() * 100, color='red', linestyle='--', 
                   label=f"Mean: {returns.mean()*100:.3f}%")
    ax_dist.set_xlabel('Daily Return (%)')
    ax_dist.set_ylabel('Frequency')
    ax_dist.set_title('Return Distribution')
    ax_dist.legend()
    ax_dist.grid(True, alpha=0.3)
    
    ax_pie = fig.add_subplot(gs[3, 1])
    ax_pie.pie([weights[k] for k in weights], labels=weights.keys(), autopct='%1.1f%%',
              colors=plt.cm.Set3(np.linspace(0, 1, len(weights))))
    ax_pie.set_title('Current Allocation')
    
    plt.tight_layout()
    
    filename = f'{strategy_name.lower().replace(" ", "_")}_tearsheet.png'
    plt.savefig(filename, dpi=150, bbox_inches='tight', facecolor='white')
    plt.show()
    
    return filename

# Create tear sheet
tearsheet = create_tear_sheet(portfolio_returns, benchmark_returns, weights, 'Balanced Portfolio')
print(f"\nTear sheet saved as: {tearsheet}")

Exercise 13.5: Risk Metrics Panel (Open-ended)

Your Task:

Build a function that creates a risk metrics visualization panel: - Rolling volatility plot (21-day window) - VaR histogram with 95% and 99% VaR lines marked - Rolling beta to benchmark (63-day window) - Return the figure object

Your implementation:

Exercise
Click to reveal solution
def create_risk_panel(returns: pd.Series, benchmark_returns: pd.Series) -> plt.Figure:
    """
    Create a risk metrics visualization panel.
    """
    fig, axes = plt.subplots(2, 2, figsize=(12, 8))

    # 1. Rolling volatility
    ax1 = axes[0, 0]
    rolling_vol = returns.rolling(21).std() * np.sqrt(252) * 100
    ax1.plot(rolling_vol.index, rolling_vol, color='orange', linewidth=1)
    ax1.axhline(returns.std() * np.sqrt(252) * 100, color='black', linestyle='--', alpha=0.7)
    ax1.set_title('21-Day Rolling Volatility (%)')
    ax1.set_ylabel('Volatility (%)')
    ax1.grid(True, alpha=0.3)

    # 2. VaR histogram
    ax2 = axes[0, 1]
    var_95 = np.percentile(returns, 5)
    var_99 = np.percentile(returns, 1)
    ax2.hist(returns * 100, bins=50, alpha=0.7, color='steelblue', edgecolor='white')
    ax2.axvline(var_95 * 100, color='orange', linestyle='--', linewidth=2, label=f'95% VaR: {var_95*100:.2f}%')
    ax2.axvline(var_99 * 100, color='red', linestyle='--', linewidth=2, label=f'99% VaR: {var_99*100:.2f}%')
    ax2.set_title('Return Distribution with VaR')
    ax2.set_xlabel('Daily Return (%)')
    ax2.legend()
    ax2.grid(True, alpha=0.3)

    # 3. Rolling beta
    ax3 = axes[1, 0]
    rolling_cov = returns.rolling(63).cov(benchmark_returns)
    rolling_var = benchmark_returns.rolling(63).var()
    rolling_beta = rolling_cov / rolling_var
    ax3.plot(rolling_beta.index, rolling_beta, color='purple', linewidth=1)
    ax3.axhline(1.0, color='black', linestyle='--', alpha=0.5)
    ax3.set_title('63-Day Rolling Beta')
    ax3.set_ylabel('Beta')
    ax3.grid(True, alpha=0.3)

    # 4. Drawdown underwater chart
    ax4 = axes[1, 1]
    cum_returns = (1 + returns).cumprod()
    drawdown = (cum_returns - cum_returns.cummax()) / cum_returns.cummax()
    ax4.fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
    ax4.set_title('Drawdown')
    ax4.set_ylabel('Drawdown (%)')
    ax4.grid(True, alpha=0.3)

    plt.tight_layout()
    return fig

# Test
fig = create_risk_panel(portfolio_returns, benchmark_returns)
plt.show()

Exercise 13.6: Complete Reporting Suite (Open-ended)

Your Task:

Build a ReportGenerator class that: - Takes portfolio returns, benchmark returns, and weights in the constructor - Has a calculate_all_metrics() method returning a dictionary of metrics - Has a print_summary() method that prints a formatted console report - Has a create_charts() method that creates and saves performance/drawdown charts - Has a generate_report() method that calls all above methods in sequence

Your implementation:

Exercise
Click to reveal solution
class ReportGenerator:
    """
    Complete financial reporting suite.
    """

    def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series, 
                 weights: dict, strategy_name: str = 'Portfolio'):
        self.returns = portfolio_returns
        self.benchmark = benchmark_returns
        self.weights = weights
        self.name = strategy_name
        self.metrics = None

    def calculate_all_metrics(self) -> dict:
        """Calculate comprehensive metrics."""
        cum_port = (1 + self.returns).cumprod()
        cum_bench = (1 + self.benchmark).cumprod()

        total_return = cum_port.iloc[-1] - 1
        bench_return = cum_bench.iloc[-1] - 1
        n_years = len(self.returns) / 252

        ann_return = (1 + total_return) ** (1/n_years) - 1
        ann_vol = self.returns.std() * np.sqrt(252)
        sharpe = ann_return / ann_vol

        running_max = cum_port.cummax()
        max_dd = ((cum_port - running_max) / running_max).min()

        self.metrics = {
            'total_return': total_return,
            'benchmark_return': bench_return,
            'active_return': total_return - bench_return,
            'ann_return': ann_return,
            'ann_volatility': ann_vol,
            'sharpe_ratio': sharpe,
            'max_drawdown': max_dd,
            'start_date': self.returns.index[0],
            'end_date': self.returns.index[-1]
        }
        return self.metrics

    def print_summary(self):
        """Print formatted summary report."""
        if self.metrics is None:
            self.calculate_all_metrics()

        print(f"\n{'='*50}")
        print(f"{self.name} Performance Report")
        print(f"{'='*50}")
        print(f"Period: {self.metrics['start_date'].strftime('%Y-%m-%d')} to {self.metrics['end_date'].strftime('%Y-%m-%d')}")
        print(f"\nRETURNS")
        print(f"  Total Return:      {self.metrics['total_return']*100:>8.2f}%")
        print(f"  Benchmark Return:  {self.metrics['benchmark_return']*100:>8.2f}%")
        print(f"  Annualized Return: {self.metrics['ann_return']*100:>8.2f}%")
        print(f"\nRISK")
        print(f"  Volatility:        {self.metrics['ann_volatility']*100:>8.2f}%")
        print(f"  Max Drawdown:      {self.metrics['max_drawdown']*100:>8.2f}%")
        print(f"  Sharpe Ratio:      {self.metrics['sharpe_ratio']:>8.2f}")
        print(f"{'='*50}\n")

    def create_charts(self, output_dir: str = '.'):
        """Create and save performance charts."""
        # Performance chart
        fig, axes = plt.subplots(2, 1, figsize=(10, 8))

        cum_port = (1 + self.returns).cumprod()
        cum_bench = (1 + self.benchmark).cumprod()

        axes[0].plot(cum_port.index, cum_port, label='Portfolio', linewidth=2)
        axes[0].plot(cum_bench.index, cum_bench, label='Benchmark', linestyle='--')
        axes[0].set_title('Cumulative Returns')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)

        drawdown = (cum_port - cum_port.cummax()) / cum_port.cummax()
        axes[1].fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
        axes[1].set_title('Drawdown')
        axes[1].set_ylabel('Drawdown (%)')
        axes[1].grid(True, alpha=0.3)

        plt.tight_layout()
        filename = f'{output_dir}/{self.name.lower().replace(" ", "_")}_charts.png'
        plt.savefig(filename, dpi=150, bbox_inches='tight')
        plt.close()
        return filename

    def generate_report(self, output_dir: str = '.'):
        """Generate complete report package."""
        self.calculate_all_metrics()
        self.print_summary()
        chart_file = self.create_charts(output_dir)
        print(f"Charts saved: {chart_file}")
        return self.metrics

# Test
reporter = ReportGenerator(portfolio_returns, benchmark_returns, weights, 'Balanced Growth')
reporter.generate_report()

Module Project: Complete Reporting Suite

Build a comprehensive reporting system that combines all concepts from this module.

Your Challenge:

Build a ReportingSuite class that includes: 1. Comprehensive metrics calculation 2. Console summary printing 3. Chart generation 4. Tear sheet creation 5. Optional PDF and Excel generation

# YOUR CODE HERE - Module Project
Click to reveal solution
class ReportingSuite:
    """
    Professional financial reporting system.

    Features:
    - Comprehensive metrics calculation
    - Console summary reports
    - Chart generation
    - Performance tear sheets
    - Optional PDF and Excel reports
    """

    def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series, 
                 weights: dict, strategy_name: str = 'Portfolio'):
        self.returns = portfolio_returns
        self.benchmark = benchmark_returns
        self.weights = weights
        self.name = strategy_name
        self.metrics = self._calculate_metrics()

    def _calculate_metrics(self) -> dict:
        """Calculate all performance metrics."""
        cum_port = (1 + self.returns).cumprod()
        cum_bench = (1 + self.benchmark).cumprod()

        total_return = cum_port.iloc[-1] - 1
        bench_return = cum_bench.iloc[-1] - 1
        n_years = len(self.returns) / 252

        ann_return = (1 + total_return) ** (1/n_years) - 1
        ann_vol = self.returns.std() * np.sqrt(252)
        sharpe = ann_return / ann_vol

        # Downside metrics
        downside = self.returns[self.returns < 0]
        sortino = ann_return / (downside.std() * np.sqrt(252))

        # Drawdown
        running_max = cum_port.cummax()
        drawdown = (cum_port - running_max) / running_max
        max_dd = drawdown.min()

        # Relative metrics
        active = self.returns - self.benchmark
        tracking_error = active.std() * np.sqrt(252)
        info_ratio = (active.mean() * 252) / tracking_error

        # Beta/Alpha
        cov = np.cov(self.returns, self.benchmark)[0, 1]
        beta = cov / self.benchmark.var()
        alpha = ann_return - beta * (self.benchmark.mean() * 252)

        return {
            'total_return': total_return,
            'benchmark_return': bench_return,
            'active_return': total_return - bench_return,
            'ann_return': ann_return,
            'ann_volatility': ann_vol,
            'sharpe_ratio': sharpe,
            'sortino_ratio': sortino,
            'max_drawdown': max_dd,
            'tracking_error': tracking_error,
            'information_ratio': info_ratio,
            'beta': beta,
            'alpha': alpha,
            'var_95': -np.percentile(self.returns, 5),
            'win_rate': (self.returns > 0).mean(),
            'start_date': self.returns.index[0],
            'end_date': self.returns.index[-1]
        }

    def print_summary(self):
        """Print formatted summary to console."""
        m = self.metrics
        print(f"\n{'='*60}")
        print(f"{self.name} Performance Report")
        print(f"{'='*60}")
        print(f"Period: {m['start_date'].strftime('%Y-%m-%d')} to {m['end_date'].strftime('%Y-%m-%d')}")
        print(f"\nRETURN METRICS")
        print(f"-" * 40)
        print(f"  Total Return:      {m['total_return']*100:>10.2f}%")
        print(f"  Benchmark Return:  {m['benchmark_return']*100:>10.2f}%")
        print(f"  Active Return:     {m['active_return']*100:>10.2f}%")
        print(f"  Annualized Return: {m['ann_return']*100:>10.2f}%")
        print(f"\nRISK METRICS")
        print(f"-" * 40)
        print(f"  Volatility:        {m['ann_volatility']*100:>10.2f}%")
        print(f"  Max Drawdown:      {m['max_drawdown']*100:>10.2f}%")
        print(f"  95% VaR:           {m['var_95']*100:>10.2f}%")
        print(f"  Tracking Error:    {m['tracking_error']*100:>10.2f}%")
        print(f"\nRISK-ADJUSTED")
        print(f"-" * 40)
        print(f"  Sharpe Ratio:      {m['sharpe_ratio']:>10.2f}")
        print(f"  Sortino Ratio:     {m['sortino_ratio']:>10.2f}")
        print(f"  Info Ratio:        {m['information_ratio']:>10.2f}")
        print(f"{'='*60}\n")

    def create_charts(self, output_dir: str = '.') -> str:
        """Generate performance and drawdown charts."""
        fig, axes = plt.subplots(2, 1, figsize=(10, 8))

        cum_port = (1 + self.returns).cumprod()
        cum_bench = (1 + self.benchmark).cumprod()

        axes[0].plot(cum_port.index, cum_port, label='Portfolio', linewidth=2, color='#2E86AB')
        axes[0].plot(cum_bench.index, cum_bench, label='Benchmark', linestyle='--', color='gray')
        axes[0].set_title(f'{self.name} - Cumulative Returns')
        axes[0].legend(loc='upper left')
        axes[0].grid(True, alpha=0.3)

        drawdown = (cum_port - cum_port.cummax()) / cum_port.cummax()
        axes[1].fill_between(drawdown.index, drawdown * 100, 0, alpha=0.5, color='red')
        axes[1].plot(drawdown.index, drawdown * 100, color='darkred', linewidth=1)
        axes[1].set_title('Drawdown')
        axes[1].set_ylabel('Drawdown (%)')
        axes[1].grid(True, alpha=0.3)

        plt.tight_layout()
        filename = f'{output_dir}/{self.name.lower().replace(" ", "_")}_charts.png'
        plt.savefig(filename, dpi=150, bbox_inches='tight')
        plt.show()
        return filename

    def create_tear_sheet(self, output_dir: str = '.') -> str:
        """Generate one-page tear sheet."""
        fig = plt.figure(figsize=(12, 14))
        fig.suptitle(f'{self.name} Tear Sheet', fontsize=16, fontweight='bold', y=0.98)

        gs = fig.add_gridspec(4, 2, height_ratios=[0.4, 1, 1, 1], hspace=0.3, wspace=0.3)

        # Metrics summary
        ax_sum = fig.add_subplot(gs[0, :])
        ax_sum.axis('off')
        m = self.metrics
        text = (f"Return: {m['total_return']*100:.1f}%  |  "
                f"Vol: {m['ann_volatility']*100:.1f}%  |  "
                f"Sharpe: {m['sharpe_ratio']:.2f}  |  "
                f"Max DD: {m['max_drawdown']*100:.1f}%")
        ax_sum.text(0.5, 0.5, text, ha='center', va='center', fontsize=12,
                   bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.3))

        # Cumulative returns
        ax_cum = fig.add_subplot(gs[1, :])
        cum_port = (1 + self.returns).cumprod()
        cum_bench = (1 + self.benchmark).cumprod()
        ax_cum.plot(cum_port, label='Portfolio', linewidth=2)
        ax_cum.plot(cum_bench, label='Benchmark', linestyle='--', alpha=0.7)
        ax_cum.set_title('Cumulative Returns')
        ax_cum.legend()
        ax_cum.grid(True, alpha=0.3)

        # Drawdown
        ax_dd = fig.add_subplot(gs[2, :])
        dd = (cum_port - cum_port.cummax()) / cum_port.cummax()
        ax_dd.fill_between(dd.index, dd * 100, 0, alpha=0.5, color='red')
        ax_dd.set_title('Drawdown')
        ax_dd.set_ylabel('%')
        ax_dd.grid(True, alpha=0.3)

        # Distribution
        ax_hist = fig.add_subplot(gs[3, 0])
        ax_hist.hist(self.returns * 100, bins=50, alpha=0.7, color='steelblue')
        ax_hist.axvline(self.returns.mean() * 100, color='red', linestyle='--')
        ax_hist.set_title('Return Distribution')
        ax_hist.set_xlabel('Daily Return (%)')
        ax_hist.grid(True, alpha=0.3)

        # Allocation
        ax_pie = fig.add_subplot(gs[3, 1])
        ax_pie.pie(list(self.weights.values()), labels=list(self.weights.keys()),
                  autopct='%1.1f%%', colors=plt.cm.Set3(np.linspace(0, 1, len(self.weights))))
        ax_pie.set_title('Allocation')

        plt.tight_layout()
        filename = f'{output_dir}/{self.name.lower().replace(" ", "_")}_tearsheet.png'
        plt.savefig(filename, dpi=150, bbox_inches='tight', facecolor='white')
        plt.show()
        return filename

    def generate_all_reports(self, output_dir: str = '.'):
        """Generate complete report package."""
        print(f"Generating reports for {self.name}...\n")

        self.print_summary()
        chart_file = self.create_charts(output_dir)
        tearsheet_file = self.create_tear_sheet(output_dir)

        print(f"\nReports generated:")
        print(f"  - Charts: {chart_file}")
        print(f"  - Tear Sheet: {tearsheet_file}")
        print("\nReport generation complete!")

# Demo
suite = ReportingSuite(portfolio_returns, benchmark_returns, weights, 'Balanced Growth')
suite.generate_all_reports()
# Solution - Module Project
class ReportingSuite:
    """
    Professional financial reporting system.
    """
    
    def __init__(self, portfolio_returns: pd.Series, benchmark_returns: pd.Series, 
                 weights: dict, strategy_name: str = 'Portfolio'):
        self.returns = portfolio_returns
        self.benchmark = benchmark_returns
        self.weights = weights
        self.name = strategy_name
        self.metrics = self._calculate_metrics()
    
    def _calculate_metrics(self) -> dict:
        """Calculate all performance metrics."""
        return calculate_report_metrics(self.returns, self.benchmark)
    
    def print_summary(self):
        """Print formatted summary to console."""
        m = self.metrics
        print(f"\n{'='*60}")
        print(f"{self.name} Performance Report")
        print(f"{'='*60}")
        print(f"Period: {m['start_date'].strftime('%Y-%m-%d')} to {m['end_date'].strftime('%Y-%m-%d')}")
        print(f"\nRETURN METRICS")
        print(f"  Total Return:      {m['total_return']*100:>10.2f}%")
        print(f"  Benchmark Return:  {m['benchmark_return']*100:>10.2f}%")
        print(f"  Annualized Return: {m['ann_return']*100:>10.2f}%")
        print(f"\nRISK METRICS")
        print(f"  Volatility:        {m['ann_volatility']*100:>10.2f}%")
        print(f"  Max Drawdown:      {m['max_drawdown']*100:>10.2f}%")
        print(f"\nRISK-ADJUSTED")
        print(f"  Sharpe Ratio:      {m['sharpe_ratio']:>10.2f}")
        print(f"  Sortino Ratio:     {m['sortino_ratio']:>10.2f}")
        print(f"{'='*60}\n")
    
    def generate_all_reports(self, output_dir: str = '.'):
        """Generate complete report package."""
        print(f"Generating reports for {self.name}...")
        self.print_summary()
        create_tear_sheet(self.returns, self.benchmark, self.weights, self.name)
        print("\nReport generation complete!")

# Demo
suite = ReportingSuite(portfolio_returns, benchmark_returns, weights, 'Balanced Growth')
suite.print_summary()

Key Takeaways

What You Learned

1. Report Design Principles

  • Structure reports with executive summary first
  • Tailor content to audience (client vs internal vs regulatory)
  • Include benchmarks for context

2. PDF Generation

  • ReportLab provides professional PDF output
  • Use tables for metrics, images for charts
  • Include proper disclaimers

3. Excel Reports

  • openpyxl enables formatted workbooks
  • Color-code positive/negative values
  • Separate sheets for different data views

4. Tear Sheets

  • One-page summary of strategy performance
  • Include all key metrics and visualizations
  • Useful for marketing and quick reviews

Coming Up Next

In Module 14: Rebalancing & Execution, we'll explore: - Rebalancing strategies (calendar, threshold) - Transaction cost analysis - Tax-loss harvesting - Implementation shortfall


Congratulations on completing Module 13!

Module 14: Rebalancing & Execution

Course 3: Quantitative Finance & Portfolio Theory
Part 5: Production & Infrastructure


Learning Objectives

By the end of this module, you will be able to:

  1. Implement calendar and threshold rebalancing strategies
  2. Model and minimize transaction costs
  3. Apply tax-loss harvesting techniques
  4. Measure implementation shortfall and execution quality
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 5, 11

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from scipy.optimize import minimize
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)

print('Module 14: Rebalancing & Execution - Ready!')

Load Data

# Download data
tickers = ['SPY', 'AGG', 'GLD', 'VNQ']
data = yf.download(tickers, start='2020-01-01', end='2024-01-01', progress=False)

if isinstance(data.columns, pd.MultiIndex):
    prices = data['Adj Close'] if 'Adj Close' in data.columns.get_level_values(0) else data['Close']
else:
    prices = data

returns = prices.pct_change().dropna()

# Target allocation
target_weights = {'SPY': 0.50, 'AGG': 0.30, 'GLD': 0.10, 'VNQ': 0.10}

print(f'Data loaded: {len(returns)} trading days')

Section 14.1: Rebalancing Strategies

Over time, assets with higher returns grow to dominate your portfolio, increasing concentration risk and drifting away from your intended allocation.

In this section, you will learn: - Calendar rebalancing (fixed schedule) - Threshold rebalancing (drift-triggered) - Hybrid approaches

14.1.1 Portfolio Drift Simulation

def simulate_portfolio_drift(initial_weights: dict, returns_df: pd.DataFrame,
                            rebalance_frequency: str = None) -> pd.DataFrame:
    """
    Simulate how portfolio weights drift over time.
    
    Parameters:
    -----------
    initial_weights : dict
        Target weights for each asset
    returns_df : DataFrame
        Daily returns for each asset
    rebalance_frequency : str
        'M' for monthly, 'Q' for quarterly, None for never
    """
    assets = list(initial_weights.keys())
    weights = pd.DataFrame(index=returns_df.index, columns=assets, dtype=float)
    
    current_weights = np.array([initial_weights[a] for a in assets])
    
    if rebalance_frequency:
        rebalance_dates = returns_df.resample(rebalance_frequency).last().index
    else:
        rebalance_dates = []
    
    for date in returns_df.index:
        weights.loc[date] = current_weights
        day_returns = returns_df.loc[date, assets].values
        growth = 1 + day_returns
        new_values = current_weights * growth
        current_weights = new_values / new_values.sum()
        
        if date in rebalance_dates:
            current_weights = np.array([initial_weights[a] for a in assets])
    
    return weights

# Simulate drift scenarios
weights_never = simulate_portfolio_drift(target_weights, returns, None)
weights_monthly = simulate_portfolio_drift(target_weights, returns, 'M')
weights_quarterly = simulate_portfolio_drift(target_weights, returns, 'Q')

print("Final SPY weight by rebalancing approach:")
print(f"  Never rebalanced: {weights_never['SPY'].iloc[-1]:.1%}")
print(f"  Monthly: {weights_monthly['SPY'].iloc[-1]:.1%}")
print(f"  Quarterly: {weights_quarterly['SPY'].iloc[-1]:.1%}")

14.1.2 Threshold Rebalancing

class ThresholdRebalancer:
    """
    Rebalances when any asset drifts beyond a threshold.
    """
    
    def __init__(self, target_weights: dict, threshold: float = 0.05):
        self.target_weights = target_weights
        self.threshold = threshold
        self.assets = list(target_weights.keys())
    
    def needs_rebalance(self, current_weights: dict) -> bool:
        """Check if any asset has drifted beyond threshold."""
        for asset, target in self.target_weights.items():
            current = current_weights.get(asset, 0)
            if abs(current - target) > self.threshold:
                return True
        return False
    
    def backtest(self, returns_df: pd.DataFrame, initial_value: float = 100000,
                 transaction_cost: float = 0.001) -> tuple:
        """Backtest the threshold rebalancing strategy."""
        portfolio_value = initial_value
        holdings = {a: self.target_weights[a] * portfolio_value for a in self.assets}
        
        values = []
        total_costs = 0
        num_rebalances = 0
        
        for date in returns_df.index:
            for asset in self.assets:
                holdings[asset] *= (1 + returns_df.loc[date, asset])
            
            portfolio_value = sum(holdings.values())
            current_weights = {a: holdings[a]/portfolio_value for a in self.assets}
            
            if self.needs_rebalance(current_weights):
                turnover = sum(abs(self.target_weights[a] - current_weights[a]) 
                              for a in self.assets) * portfolio_value / 2
                cost = turnover * transaction_cost
                total_costs += cost
                num_rebalances += 1
                
                portfolio_value -= cost
                holdings = {a: self.target_weights[a] * portfolio_value for a in self.assets}
            
            values.append({'date': date, 'value': portfolio_value})
        
        return pd.DataFrame(values).set_index('date'), total_costs, num_rebalances

# Compare thresholds
print("Threshold Rebalancing Comparison")
print("=" * 50)

for thresh in [0.03, 0.05, 0.10]:
    rebalancer = ThresholdRebalancer(target_weights, threshold=thresh)
    values, costs, num_rebal = rebalancer.backtest(returns)
    print(f"{thresh:.0%} Threshold: Final=${values['value'].iloc[-1]:,.0f}, "
          f"Rebalances={num_rebal}, Costs=${costs:,.0f}")

Exercise 14.1: Drift Calculator (Guided)

Your Task: Complete the function to calculate portfolio drift metrics.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def calculate_drift_metrics(current_weights: dict, target_weights: dict) -> dict:
    """
    Calculate portfolio drift from target allocation.
    """
    drifts = {}
    for asset in target_weights:
        current = current_weights.get(asset, 0)
        target = target_weights[asset]
        # Calculate absolute drift
        drifts[asset] = current - target

    # Find the maximum absolute drift
    max_drift = max(abs(d) for d in drifts.values())

    # Calculate total absolute drift
    total_drift = sum(abs(d) for d in drifts.values())

    return {
        'drifts': drifts,
        'max_drift': max_drift,
        'total_drift': total_drift
    }

# Test
current = {'SPY': 0.55, 'AGG': 0.25, 'GLD': 0.12, 'VNQ': 0.08}
metrics = calculate_drift_metrics(current, target_weights)
print(f"Max drift: {metrics['max_drift']:.1%}")
print(f"Total drift: {metrics['total_drift']:.1%}")

Section 14.2: Transaction Cost Analysis

Transaction costs can significantly impact portfolio performance.

In this section, you will learn: - Types of transaction costs (explicit and implicit) - Market impact modeling - Cost-aware portfolio construction

14.2.1 Transaction Cost Model

class TransactionCostModel:
    """
    Comprehensive transaction cost estimator.
    """
    
    def __init__(self, commission_per_share: float = 0.005,
                 commission_min: float = 1.0):
        self.commission_per_share = commission_per_share
        self.commission_min = commission_min
    
    def estimate_spread_cost(self, price: float, spread_bps: float = 10) -> float:
        """Estimate bid-ask spread cost (half spread per transaction)."""
        spread = price * (spread_bps / 10000)
        return spread / 2
    
    def estimate_market_impact(self, trade_size: int, avg_daily_volume: int,
                               price: float, volatility: float = 0.02) -> float:
        """
        Estimate market impact using square-root model.
        Impact = sigma * sqrt(Q/V)
        """
        participation_rate = trade_size / avg_daily_volume
        impact_pct = volatility * np.sqrt(participation_rate)
        return price * impact_pct
    
    def total_cost(self, shares: int, price: float,
                   avg_daily_volume: int = 1000000,
                   volatility: float = 0.02) -> dict:
        """Calculate total transaction cost."""
        trade_value = shares * price
        
        commission = max(shares * self.commission_per_share, self.commission_min)
        spread_cost = self.estimate_spread_cost(price) * shares
        impact_cost = self.estimate_market_impact(
            shares, avg_daily_volume, price, volatility
        ) * shares
        
        return {
            'commission': commission,
            'spread': spread_cost,
            'market_impact': impact_cost,
            'total': commission + spread_cost + impact_cost,
            'total_bps': (commission + spread_cost + impact_cost) / trade_value * 10000
        }

# Example
cost_model = TransactionCostModel()
costs = cost_model.total_cost(shares=1000, price=150, avg_daily_volume=5_000_000)

print("Transaction Cost Breakdown (1000 shares @ $150)")
print("=" * 45)
for component, value in costs.items():
    if component == 'total_bps':
        print(f"Total: {value:.2f} bps")
    else:
        print(f"{component.capitalize()}: ${value:.2f}")

Exercise 14.2: Break-Even Holding Period (Guided)

Your Task: Complete the function to calculate how long you need to hold a position for expected returns to overcome transaction costs.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def break_even_holding_period(expected_annual_return: float,
                              round_trip_cost: float) -> float:
    """
    Calculate minimum holding period for returns to exceed costs.
    """
    # Calculate daily expected return (252 trading days/year)
    daily_return = expected_annual_return / 252

    # Break-even when daily_return * days = round_trip_cost
    break_even_days = round_trip_cost / daily_return

    return break_even_days

# Test
scenarios = [
    (0.10, 0.002, "Stock 10% return, 20 bps cost"),
    (0.20, 0.002, "Growth 20% return, 20 bps cost"),
    (0.05, 0.001, "Bond 5% return, 10 bps cost")
]
for ret, cost, desc in scenarios:
    days = break_even_holding_period(ret, cost)
    print(f"{desc}: {days:.1f} days ({days/21:.1f} months)")

Section 14.3: Tax-Loss Harvesting

Tax-loss harvesting strategically realizes losses to offset gains and reduce tax liability.

In this section, you will learn: - Identifying harvest opportunities - Wash sale rule compliance - Calculating tax benefits

14.3.1 Tax-Loss Harvesting System

class TaxLossHarvester:
    """
    Implements tax-loss harvesting strategy.
    """
    
    def __init__(self, tax_rate_short: float = 0.37,
                 tax_rate_long: float = 0.20):
        self.tax_rate_short = tax_rate_short
        self.tax_rate_long = tax_rate_long
    
    def identify_opportunities(self, positions: list,
                               min_loss_pct: float = 0.05) -> list:
        """
        Find positions with harvestable losses.
        
        Parameters:
        -----------
        positions : list of dict
            Each has: symbol, cost_basis, current_value, purchase_date
        min_loss_pct : float
            Minimum loss percentage to trigger harvest
        """
        opportunities = []
        today = datetime.now()
        
        for pos in positions:
            gain_loss = pos['current_value'] - pos['cost_basis']
            gain_loss_pct = gain_loss / pos['cost_basis']
            
            if gain_loss_pct <= -min_loss_pct:
                holding_period = (today - pos['purchase_date']).days
                is_long_term = holding_period >= 365
                
                tax_rate = self.tax_rate_long if is_long_term else self.tax_rate_short
                tax_benefit = abs(gain_loss) * tax_rate
                
                opportunities.append({
                    'symbol': pos['symbol'],
                    'loss': gain_loss,
                    'loss_pct': gain_loss_pct,
                    'holding_days': holding_period,
                    'is_long_term': is_long_term,
                    'tax_benefit': tax_benefit
                })
        
        return sorted(opportunities, key=lambda x: x['tax_benefit'], reverse=True)

# Example positions
positions = [
    {'symbol': 'AAPL', 'cost_basis': 50000, 'current_value': 65000,
     'purchase_date': datetime(2023, 1, 15)},
    {'symbol': 'MSFT', 'cost_basis': 40000, 'current_value': 38000,
     'purchase_date': datetime(2024, 6, 1)},
    {'symbol': 'NVDA', 'cost_basis': 20000, 'current_value': 15000,
     'purchase_date': datetime(2024, 9, 1)},
]

harvester = TaxLossHarvester()
opportunities = harvester.identify_opportunities(positions)

print("Tax-Loss Harvesting Opportunities")
print("=" * 50)
for opp in opportunities:
    print(f"{opp['symbol']}: Loss ${opp['loss']:,.0f} ({opp['loss_pct']:.1%}), "
          f"Tax Benefit ${opp['tax_benefit']:,.0f}")

Exercise 14.3: Tax Benefit Calculator (Guided)

Your Task: Complete the function to calculate net tax savings from harvesting losses.

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def calculate_tax_savings(realized_gains: float, harvested_losses: float,
                          tax_rate: float = 0.30) -> dict:
    """
    Calculate tax savings from loss harvesting.
    """
    # Calculate tax without harvesting
    tax_without = realized_gains * tax_rate

    # Calculate net taxable gains after offsetting losses
    net_gains = max(0, realized_gains - harvested_losses)

    # Calculate tax with harvesting
    tax_with = net_gains * tax_rate

    # Calculate savings
    savings = tax_without - tax_with

    return {
        'tax_without_harvesting': tax_without,
        'tax_with_harvesting': tax_with,
        'tax_savings': savings
    }

# Test
result = calculate_tax_savings(realized_gains=15000, harvested_losses=5000)
print(f"Tax without harvesting: ${result['tax_without_harvesting']:,.0f}")
print(f"Tax with harvesting: ${result['tax_with_harvesting']:,.0f}")
print(f"Tax savings: ${result['tax_savings']:,.0f}")

Exercise 14.4: Hybrid Rebalancer (Open-ended)

Your Task:

Build a HybridRebalancer class that: - Checks on a calendar schedule (e.g., monthly) - Only rebalances if max drift exceeds a trigger threshold - When rebalancing, only trades assets drifted beyond a trade threshold - Returns the final portfolio value, total costs, and number of rebalances

Your implementation:

Exercise
Click to reveal solution
class HybridRebalancer:
    """
    Combines calendar checks with threshold triggers and partial rebalancing.
    """

    def __init__(self, target_weights: dict, check_frequency: str = 'M',
                 trigger_threshold: float = 0.05, trade_threshold: float = 0.02):
        self.target_weights = target_weights
        self.check_frequency = check_frequency
        self.trigger_threshold = trigger_threshold
        self.trade_threshold = trade_threshold
        self.assets = list(target_weights.keys())

    def backtest(self, returns_df: pd.DataFrame, initial_value: float = 100000,
                 transaction_cost: float = 0.001) -> tuple:
        portfolio_value = initial_value
        holdings = {a: self.target_weights[a] * portfolio_value for a in self.assets}

        values = []
        total_costs = 0
        num_rebalances = 0

        check_dates = set(returns_df.resample(self.check_frequency).last().index)

        for date in returns_df.index:
            for asset in self.assets:
                holdings[asset] *= (1 + returns_df.loc[date, asset])

            portfolio_value = sum(holdings.values())
            current_weights = {a: holdings[a]/portfolio_value for a in self.assets}

            if date in check_dates:
                max_drift = max(abs(current_weights[a] - self.target_weights[a]) 
                               for a in self.assets)

                if max_drift >= self.trigger_threshold:
                    new_weights = current_weights.copy()

                    for asset in self.assets:
                        drift = abs(current_weights[asset] - self.target_weights[asset])
                        if drift >= self.trade_threshold:
                            new_weights[asset] = self.target_weights[asset]

                    total = sum(new_weights.values())
                    new_weights = {a: w/total for a, w in new_weights.items()}

                    turnover = sum(abs(new_weights[a] - current_weights[a]) 
                                  for a in self.assets) / 2
                    cost = turnover * portfolio_value * transaction_cost
                    total_costs += cost
                    num_rebalances += 1

                    portfolio_value -= cost
                    holdings = {a: new_weights[a] * portfolio_value for a in self.assets}

            values.append({'date': date, 'value': portfolio_value})

        return pd.DataFrame(values).set_index('date'), total_costs, num_rebalances

# Test
hybrid = HybridRebalancer(target_weights, check_frequency='M',
                          trigger_threshold=0.05, trade_threshold=0.02)
values, costs, num_rebal = hybrid.backtest(returns)
print(f"Final Value: ${values['value'].iloc[-1]:,.0f}")
print(f"Total Costs: ${costs:,.0f}")
print(f"Rebalances: {num_rebal}")

Section 14.4: Implementation Shortfall

Implementation shortfall measures the total cost of executing a trading decision.

In this section, you will learn: - Components of implementation shortfall - VWAP and TWAP execution - Measuring execution quality

14.4.1 Implementation Shortfall Analysis

def calculate_implementation_shortfall(order: dict) -> dict:
    """
    Calculate implementation shortfall components.
    
    Parameters:
    -----------
    order : dict
        decision_price, arrival_price, execution_price,
        close_price, shares_ordered, shares_filled, side
    """
    side_mult = 1 if order['side'] == 'buy' else -1
    
    # Delay cost: decision to arrival
    delay_cost = side_mult * (
        order['arrival_price'] - order['decision_price']
    ) / order['decision_price']
    
    # Trading cost: arrival to execution
    trading_cost = side_mult * (
        order['execution_price'] - order['arrival_price']
    ) / order['decision_price']
    
    # Opportunity cost: unfilled portion
    unfilled = order['shares_ordered'] - order['shares_filled']
    if unfilled > 0:
        opp_cost = side_mult * unfilled / order['shares_ordered'] * (
            order['close_price'] - order['decision_price']
        ) / order['decision_price']
    else:
        opp_cost = 0
    
    return {
        'delay_cost_bps': delay_cost * 10000,
        'trading_cost_bps': trading_cost * 10000,
        'opportunity_cost_bps': opp_cost * 10000,
        'total_shortfall_bps': (delay_cost + trading_cost + opp_cost) * 10000
    }

# Example
order = {
    'side': 'buy',
    'decision_price': 185.00,
    'arrival_price': 185.20,
    'execution_price': 185.45,
    'close_price': 186.00,
    'shares_ordered': 1000,
    'shares_filled': 1000
}

shortfall = calculate_implementation_shortfall(order)
print("Implementation Shortfall")
print("=" * 40)
for component, value in shortfall.items():
    print(f"{component}: {value:.2f}")

Exercise 14.5: Slippage Calculator (Open-ended)

Your Task:

Build a function that calculates slippage for a list of trades: - For buys: slippage = (actual - expected) / expected - For sells: slippage = (expected - actual) / expected - Return total slippage in dollars and weighted average in basis points

Your implementation:

Exercise
Click to reveal solution
def calculate_slippage(trades: list) -> dict:
    """
    Calculate slippage for a list of trades.

    Parameters:
    -----------
    trades : list of dict
        Each has: expected_price, actual_price, shares, side
    """
    total_expected = 0
    total_slippage_dollars = 0

    for trade in trades:
        expected = trade['expected_price']
        actual = trade['actual_price']
        shares = trade['shares']
        side = trade['side']

        trade_value = expected * shares
        total_expected += trade_value

        if side == 'buy':
            slippage = (actual - expected) * shares
        else:
            slippage = (expected - actual) * shares

        total_slippage_dollars += slippage

    weighted_slippage_bps = (total_slippage_dollars / total_expected) * 10000

    return {
        'total_slippage_dollars': total_slippage_dollars,
        'weighted_slippage_bps': weighted_slippage_bps,
        'total_trade_value': total_expected
    }

# Test
trades = [
    {'expected_price': 100.00, 'actual_price': 100.05, 'shares': 500, 'side': 'buy'},
    {'expected_price': 150.00, 'actual_price': 150.10, 'shares': 300, 'side': 'buy'},
    {'expected_price': 75.00, 'actual_price': 74.90, 'shares': 800, 'side': 'sell'},
]

result = calculate_slippage(trades)
print(f"Total Slippage: ${result['total_slippage_dollars']:.2f}")
print(f"Weighted Slippage: {result['weighted_slippage_bps']:.2f} bps")

Exercise 14.6: Complete Rebalancing Engine (Open-ended)

Your Task:

Build a RebalancingEngine class that: - Takes target weights and configuration options in the constructor - Has an analyze_portfolio() method that returns current weights, drift, and whether rebalancing is needed - Has a generate_trades() method that creates a trade list to reach targets - Has an execute_rebalance() method that simulates or executes the rebalance

Your implementation:

Exercise
Click to reveal solution
class RebalancingEngine:
    """
    Production-ready portfolio rebalancing engine.
    """

    def __init__(self, target_weights: dict, config: dict = None):
        self.target_weights = target_weights
        self.config = config or {
            'rebalance_threshold': 0.05,
            'trade_threshold': 0.02,
            'transaction_cost_bps': 10
        }
        self.assets = list(target_weights.keys())

    def analyze_portfolio(self, holdings: dict, prices: dict) -> dict:
        """Analyze current portfolio state."""
        values = {s: holdings[s] * prices[s] for s in holdings}
        total_value = sum(values.values())
        current_weights = {s: v/total_value for s, v in values.items()}

        drift = {s: current_weights.get(s, 0) - self.target_weights[s] 
                for s in self.target_weights}
        max_drift = max(abs(d) for d in drift.values())

        return {
            'total_value': total_value,
            'current_weights': current_weights,
            'drift': drift,
            'max_drift': max_drift,
            'needs_rebalance': max_drift > self.config['rebalance_threshold']
        }

    def generate_trades(self, holdings: dict, prices: dict) -> list:
        """Generate trade list."""
        analysis = self.analyze_portfolio(holdings, prices)
        trades = []

        for symbol in self.target_weights:
            drift = analysis['drift'].get(symbol, 0)
            if abs(drift) < self.config['trade_threshold']:
                continue

            target_value = self.target_weights[symbol] * analysis['total_value']
            current_value = analysis['current_weights'].get(symbol, 0) * analysis['total_value']
            trade_value = target_value - current_value
            shares = int(trade_value / prices[symbol])

            if shares != 0:
                trades.append({
                    'symbol': symbol,
                    'shares': shares,
                    'side': 'buy' if shares > 0 else 'sell',
                    'value': abs(shares * prices[symbol])
                })

        return sorted(trades, key=lambda x: (x['side'] == 'buy', -x['value']))

    def execute_rebalance(self, holdings: dict, prices: dict, dry_run: bool = True) -> dict:
        """Execute rebalance."""
        analysis = self.analyze_portfolio(holdings, prices)
        trades = self.generate_trades(holdings, prices)

        total_turnover = sum(t['value'] for t in trades)
        total_cost = total_turnover * self.config['transaction_cost_bps'] / 10000

        return {
            'portfolio_value': analysis['total_value'],
            'max_drift_before': analysis['max_drift'],
            'num_trades': len(trades),
            'total_turnover': total_turnover,
            'estimated_cost': total_cost,
            'trades': trades,
            'dry_run': dry_run
        }

# Test
engine = RebalancingEngine(target_weights)
holdings = {'SPY': 100, 'AGG': 200, 'GLD': 40, 'VNQ': 80}
prices = {'SPY': 480, 'AGG': 98, 'GLD': 210, 'VNQ': 85}

result = engine.execute_rebalance(holdings, prices, dry_run=True)
print(f"Portfolio Value: ${result['portfolio_value']:,.0f}")
print(f"Max Drift: {result['max_drift_before']:.1%}")
print(f"Trades: {result['num_trades']}")
print(f"Estimated Cost: ${result['estimated_cost']:.2f}")

Module Project: Production Rebalancing System

Build a comprehensive rebalancing system that combines all concepts.

Your Challenge:

Build a ProductionRebalancer class that includes: 1. Multiple rebalancing strategies (calendar, threshold, hybrid) 2. Transaction cost modeling 3. Tax-loss harvesting integration 4. Execution quality tracking

# YOUR CODE HERE - Module Project
Click to reveal solution
class ProductionRebalancer:
    """
    Production-ready rebalancing system.
    """

    def __init__(self, target_weights: dict, config: dict = None):
        self.target_weights = target_weights
        self.config = config or {
            'strategy': 'hybrid',
            'check_frequency': 'M',
            'trigger_threshold': 0.05,
            'trade_threshold': 0.02,
            'transaction_cost_bps': 10,
            'enable_tax_harvesting': True,
            'min_harvest_loss_pct': 0.05,
            'tax_rate': 0.30
        }
        self.assets = list(target_weights.keys())
        self.rebalance_history = []

    def analyze_portfolio(self, holdings: dict, prices: dict,
                         cost_basis: dict = None) -> dict:
        """Comprehensive portfolio analysis."""
        values = {s: holdings.get(s, 0) * prices.get(s, 0) for s in self.assets}
        total_value = sum(values.values())

        if total_value == 0:
            return {'total_value': 0, 'needs_rebalance': False}

        current_weights = {s: v/total_value for s, v in values.items()}
        drift = {s: current_weights[s] - self.target_weights[s] for s in self.assets}
        max_drift = max(abs(d) for d in drift.values())

        # Tax-loss harvesting opportunities
        harvest_opportunities = []
        if self.config['enable_tax_harvesting'] and cost_basis:
            for symbol in self.assets:
                if symbol in cost_basis and symbol in holdings:
                    current_value = values[symbol]
                    basis = cost_basis[symbol] * holdings[symbol]
                    gain_loss_pct = (current_value - basis) / basis if basis > 0 else 0

                    if gain_loss_pct <= -self.config['min_harvest_loss_pct']:
                        loss = current_value - basis
                        harvest_opportunities.append({
                            'symbol': symbol,
                            'loss': loss,
                            'tax_benefit': abs(loss) * self.config['tax_rate']
                        })

        return {
            'total_value': total_value,
            'current_weights': current_weights,
            'drift': drift,
            'max_drift': max_drift,
            'needs_rebalance': max_drift > self.config['trigger_threshold'],
            'harvest_opportunities': harvest_opportunities
        }

    def generate_trades(self, holdings: dict, prices: dict,
                       analysis: dict = None) -> list:
        """Generate optimal trade list."""
        if analysis is None:
            analysis = self.analyze_portfolio(holdings, prices)

        trades = []
        total_value = analysis['total_value']

        for symbol in self.assets:
            drift = analysis['drift'].get(symbol, 0)
            if abs(drift) < self.config['trade_threshold']:
                continue

            target_value = self.target_weights[symbol] * total_value
            current_value = analysis['current_weights'].get(symbol, 0) * total_value
            trade_value = target_value - current_value
            shares = int(trade_value / prices[symbol]) if prices[symbol] > 0 else 0

            if shares != 0:
                cost_bps = self.config['transaction_cost_bps']
                est_cost = abs(shares * prices[symbol]) * cost_bps / 10000

                trades.append({
                    'symbol': symbol,
                    'shares': shares,
                    'side': 'buy' if shares > 0 else 'sell',
                    'price': prices[symbol],
                    'value': abs(shares * prices[symbol]),
                    'estimated_cost': est_cost
                })

        return sorted(trades, key=lambda x: (x['side'] == 'buy', -x['value']))

    def execute_rebalance(self, holdings: dict, prices: dict,
                         cost_basis: dict = None, dry_run: bool = True) -> dict:
        """Execute a rebalance operation."""
        analysis = self.analyze_portfolio(holdings, prices, cost_basis)
        trades = self.generate_trades(holdings, prices, analysis)

        total_turnover = sum(t['value'] for t in trades)
        total_cost = sum(t['estimated_cost'] for t in trades)
        total_harvest = sum(o['tax_benefit'] for o in analysis.get('harvest_opportunities', []))

        summary = {
            'timestamp': datetime.now(),
            'portfolio_value': analysis['total_value'],
            'max_drift_before': analysis['max_drift'],
            'num_trades': len(trades),
            'total_turnover': total_turnover,
            'turnover_pct': total_turnover / analysis['total_value'] if analysis['total_value'] > 0 else 0,
            'estimated_cost': total_cost,
            'cost_bps': total_cost / analysis['total_value'] * 10000 if analysis['total_value'] > 0 else 0,
            'tax_harvest_benefit': total_harvest,
            'trades': trades,
            'dry_run': dry_run
        }

        if not dry_run:
            self.rebalance_history.append(summary)

        return summary

    def print_report(self, summary: dict):
        """Print formatted rebalance report."""
        print("=" * 50)
        print("REBALANCE REPORT")
        print("=" * 50)
        print(f"Status: {'DRY RUN' if summary['dry_run'] else 'EXECUTED'}")
        print(f"Portfolio Value: ${summary['portfolio_value']:,.0f}")
        print(f"Max Drift: {summary['max_drift_before']:.1%}")
        print(f"\nTrades: {summary['num_trades']}")
        print(f"Turnover: ${summary['total_turnover']:,.0f} ({summary['turnover_pct']:.1%})")
        print(f"Est. Cost: ${summary['estimated_cost']:.2f} ({summary['cost_bps']:.1f} bps)")
        if summary['tax_harvest_benefit'] > 0:
            print(f"Tax Benefit: ${summary['tax_harvest_benefit']:,.0f}")
        print("=" * 50)

# Demo
rebalancer = ProductionRebalancer(target_weights)
holdings = {'SPY': 100, 'AGG': 200, 'GLD': 40, 'VNQ': 80}
prices = {'SPY': 480, 'AGG': 98, 'GLD': 210, 'VNQ': 85}
cost_basis = {'SPY': 400, 'AGG': 105, 'GLD': 190, 'VNQ': 95}

result = rebalancer.execute_rebalance(holdings, prices, cost_basis, dry_run=True)
rebalancer.print_report(result)
# Solution - Module Project (simplified demo)
class ProductionRebalancer:
    def __init__(self, target_weights: dict):
        self.target_weights = target_weights
        self.assets = list(target_weights.keys())
    
    def analyze_and_rebalance(self, holdings: dict, prices: dict) -> dict:
        values = {s: holdings.get(s, 0) * prices.get(s, 0) for s in self.assets}
        total = sum(values.values())
        weights = {s: v/total for s, v in values.items()}
        drift = {s: abs(weights[s] - self.target_weights[s]) for s in self.assets}
        
        return {
            'total_value': total,
            'weights': weights,
            'max_drift': max(drift.values()),
            'needs_rebalance': max(drift.values()) > 0.05
        }

# Demo
rebalancer = ProductionRebalancer(target_weights)
result = rebalancer.analyze_and_rebalance(
    {'SPY': 100, 'AGG': 200, 'GLD': 40, 'VNQ': 80},
    {'SPY': 480, 'AGG': 98, 'GLD': 210, 'VNQ': 85}
)
print(f"Portfolio Value: ${result['total_value']:,.0f}")
print(f"Max Drift: {result['max_drift']:.1%}")
print(f"Needs Rebalance: {result['needs_rebalance']}")

Key Takeaways

What You Learned

1. Rebalancing Strategies

  • Calendar rebalancing: simple but may over/under-trade
  • Threshold rebalancing: trades only when drift exceeds limit
  • Hybrid approaches combine benefits of both

2. Transaction Costs

  • Include explicit (commissions) and implicit (spread, impact) costs
  • Market impact grows with trade size (square-root model)
  • Break-even analysis helps determine minimum holding periods

3. Tax-Loss Harvesting

  • Can add 0.5-1% annually to after-tax returns
  • Must avoid wash sale rule (30-day window)
  • Use correlated substitutes to maintain exposure

4. Implementation Shortfall

  • Measures total cost of trading decisions
  • Components: delay, trading, opportunity costs
  • VWAP/TWAP algorithms help minimize impact

Coming Up Next

In Module 15: Market Microstructure, we'll explore: - Order books and price formation - Bid-ask spread dynamics - Market maker behavior - Optimal execution strategies


Congratulations on completing Module 14!

Module 15: Market Microstructure

Course 3: Quantitative Finance & Portfolio Theory
Part 5: Production & Infrastructure


Learning Objectives

By the end of this module, you will be able to:

  1. Understand limit order book mechanics and price-time priority
  2. Analyze bid-ask spread components and estimate spreads
  3. Model price impact using square-root and Almgren-Chriss models
  4. Implement optimal execution algorithms (TWAP, VWAP, IS)
Attribute Value
Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 14 (Rebalancing & Execution)

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict
from datetime import datetime, timedelta
from dataclasses import dataclass
from enum import Enum
from typing import List, Dict, Optional, Tuple
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: f'{x:.4f}')
np.set_printoptions(precision=4)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("Module 15: Market Microstructure")
print("=" * 40)

Section 15.1: Order Book Mechanics

Modern electronic markets are organized around the limit order book (LOB) - a collection of buy and sell orders at various prices.

In this section, you will learn: - Order types (market, limit, stop) - Price-time priority matching - Book imbalance as a directional signal

Order Types

Order Type Description Execution
Market Execute immediately at best available price Certain execution, uncertain price
Limit Execute only at specified price or better Uncertain execution, certain price
Stop Becomes market order when price reaches trigger Risk management

Order Book Structure

        SELL SIDE (Asks)           |        BUY SIDE (Bids)
    Price    Quantity              |     Price    Quantity
    $100.10    500    <-- Best Ask |     $100.05    800  <-- Best Bid
    $100.15    300                 |     $100.00    1200
    $100.20    1000                |     $99.95     400

The spread is the gap between best bid and best ask.

class OrderSide(Enum):
    BUY = "buy"
    SELL = "sell"

class OrderType(Enum):
    MARKET = "market"
    LIMIT = "limit"

@dataclass
class Order:
    """Represents a single order."""
    order_id: int
    side: OrderSide
    order_type: OrderType
    price: Optional[float]
    quantity: int
    timestamp: datetime

@dataclass
class Trade:
    """Represents an executed trade."""
    trade_id: int
    price: float
    quantity: int
    aggressor_side: OrderSide
    timestamp: datetime


class LimitOrderBook:
    """
    A simple limit order book implementation.
    
    Supports:
    - Adding limit orders
    - Market orders (immediate execution)
    - Order cancellation
    - Price-time priority matching
    """
    
    def __init__(self, tick_size: float = 0.01):
        self.tick_size = tick_size
        self.bids = defaultdict(list)
        self.asks = defaultdict(list)
        self.orders = {}
        self.trades = []
        self._order_counter = 0
        self._trade_counter = 0
    
    def _round_price(self, price: float) -> float:
        return round(price / self.tick_size) * self.tick_size
    
    def best_bid(self) -> Optional[float]:
        if not self.bids:
            return None
        return max(p for p in self.bids if self.bids[p])
    
    def best_ask(self) -> Optional[float]:
        if not self.asks:
            return None
        return min(p for p in self.asks if self.asks[p])
    
    def spread(self) -> Optional[float]:
        bid, ask = self.best_bid(), self.best_ask()
        if bid is None or ask is None:
            return None
        return ask - bid
    
    def midpoint(self) -> Optional[float]:
        bid, ask = self.best_bid(), self.best_ask()
        if bid is None or ask is None:
            return None
        return (bid + ask) / 2
    
    def add_order(self, side: OrderSide, order_type: OrderType, 
                  quantity: int, price: Optional[float] = None) -> Tuple[Order, List[Trade]]:
        self._order_counter += 1
        timestamp = datetime.now()
        
        if price is not None:
            price = self._round_price(price)
        
        order = Order(
            order_id=self._order_counter,
            side=side,
            order_type=order_type,
            price=price,
            quantity=quantity,
            timestamp=timestamp
        )
        
        trades = []
        
        if order_type == OrderType.MARKET:
            trades = self._execute_market_order(order)
        else:
            trades = self._match_order(order)
            if order.quantity > 0:
                self.orders[order.order_id] = order
                if side == OrderSide.BUY:
                    self.bids[price].append(order)
                else:
                    self.asks[price].append(order)
        
        return order, trades
    
    def _execute_market_order(self, order: Order) -> List[Trade]:
        trades = []
        remaining = order.quantity
        
        if order.side == OrderSide.BUY:
            book_side = self.asks
            price_order = sorted
        else:
            book_side = self.bids
            price_order = lambda x: sorted(x, reverse=True)
        
        for price in list(price_order(book_side.keys())):
            if remaining <= 0:
                break
            orders_at_price = book_side[price]
            
            while orders_at_price and remaining > 0:
                resting_order = orders_at_price[0]
                fill_qty = min(remaining, resting_order.quantity)
                
                self._trade_counter += 1
                trade = Trade(
                    trade_id=self._trade_counter,
                    price=resting_order.price,
                    quantity=fill_qty,
                    aggressor_side=order.side,
                    timestamp=datetime.now()
                )
                trades.append(trade)
                self.trades.append(trade)
                
                remaining -= fill_qty
                resting_order.quantity -= fill_qty
                
                if resting_order.quantity == 0:
                    orders_at_price.pop(0)
                    del self.orders[resting_order.order_id]
        
        order.quantity = remaining
        return trades
    
    def _match_order(self, order: Order) -> List[Trade]:
        trades = []
        
        if order.side == OrderSide.BUY:
            while order.quantity > 0 and self.asks:
                best_ask_price = self.best_ask()
                if best_ask_price is None or order.price < best_ask_price:
                    break
                orders_at_price = self.asks[best_ask_price]
                if not orders_at_price:
                    break
                
                resting_order = orders_at_price[0]
                fill_qty = min(order.quantity, resting_order.quantity)
                
                self._trade_counter += 1
                trade = Trade(
                    trade_id=self._trade_counter,
                    price=resting_order.price,
                    quantity=fill_qty,
                    aggressor_side=order.side,
                    timestamp=datetime.now()
                )
                trades.append(trade)
                self.trades.append(trade)
                
                order.quantity -= fill_qty
                resting_order.quantity -= fill_qty
                
                if resting_order.quantity == 0:
                    orders_at_price.pop(0)
                    del self.orders[resting_order.order_id]
        else:
            while order.quantity > 0 and self.bids:
                best_bid_price = self.best_bid()
                if best_bid_price is None or order.price > best_bid_price:
                    break
                orders_at_price = self.bids[best_bid_price]
                if not orders_at_price:
                    break
                
                resting_order = orders_at_price[0]
                fill_qty = min(order.quantity, resting_order.quantity)
                
                self._trade_counter += 1
                trade = Trade(
                    trade_id=self._trade_counter,
                    price=resting_order.price,
                    quantity=fill_qty,
                    aggressor_side=order.side,
                    timestamp=datetime.now()
                )
                trades.append(trade)
                self.trades.append(trade)
                
                order.quantity -= fill_qty
                resting_order.quantity -= fill_qty
                
                if resting_order.quantity == 0:
                    orders_at_price.pop(0)
                    del self.orders[resting_order.order_id]
        
        return trades
    
    def cancel_order(self, order_id: int) -> bool:
        if order_id not in self.orders:
            return False
        order = self.orders[order_id]
        if order.side == OrderSide.BUY:
            self.bids[order.price].remove(order)
        else:
            self.asks[order.price].remove(order)
        del self.orders[order_id]
        return True
    
    def get_book_state(self, levels: int = 5) -> Dict:
        bid_prices = sorted([p for p in self.bids if self.bids[p]], reverse=True)[:levels]
        ask_prices = sorted([p for p in self.asks if self.asks[p]])[:levels]
        
        bids = [(p, sum(o.quantity for o in self.bids[p])) for p in bid_prices]
        asks = [(p, sum(o.quantity for o in self.asks[p])) for p in ask_prices]
        
        return {
            'bids': bids,
            'asks': asks,
            'best_bid': self.best_bid(),
            'best_ask': self.best_ask(),
            'spread': self.spread(),
            'midpoint': self.midpoint()
        }
    
    def display(self):
        state = self.get_book_state()
        print("\n" + "="*50)
        print("ORDER BOOK")
        print("="*50)
        print(f"Spread: ${state['spread']:.2f}" if state['spread'] else "Spread: N/A")
        print(f"Midpoint: ${state['midpoint']:.2f}" if state['midpoint'] else "Midpoint: N/A")
        print("-"*50)
        print(f"{'ASK':^25} | {'BID':^22}")
        print(f"{'Price':>12} {'Qty':>10} | {'Price':>10} {'Qty':>10}")
        print("-"*50)
        
        asks = state['asks'][::-1]
        bids = state['bids']
        max_rows = max(len(asks), len(bids))
        
        for i in range(max_rows):
            ask_str = f"${asks[i][0]:>10.2f} {asks[i][1]:>10,}" if i < len(asks) else " "*23
            bid_str = f"${bids[i][0]:>9.2f} {bids[i][1]:>10,}" if i < len(bids) else " "*22
            print(f"{ask_str} | {bid_str}")
        print("="*50)
# Create and populate an order book
book = LimitOrderBook(tick_size=0.01)

# Add buy orders (bids)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 500, 100.00)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 800, 99.95)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 1200, 99.90)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 300, 99.85)
book.add_order(OrderSide.BUY, OrderType.LIMIT, 600, 99.80)

# Add sell orders (asks)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 400, 100.05)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 700, 100.10)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 1000, 100.15)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 500, 100.20)
book.add_order(OrderSide.SELL, OrderType.LIMIT, 900, 100.25)

print("Initial Order Book State:")
book.display()
# Simulate market order execution
print("Submitting: BUY 600 shares at MARKET")
order, trades = book.add_order(OrderSide.BUY, OrderType.MARKET, 600)

print(f"\nExecuted {len(trades)} trade(s):")
for t in trades:
    print(f"  {t.quantity} shares @ ${t.price:.2f}")

avg_price = sum(t.price * t.quantity for t in trades) / sum(t.quantity for t in trades)
print(f"\nAverage execution price: ${avg_price:.2f}")
print(f"Midpoint was: ${(100.00 + 100.05)/2:.2f}")
print(f"Slippage: ${avg_price - 100.025:.4f}")

print("\nOrder Book After Market Buy:")
book.display()

Exercise 15.1: Book Imbalance Calculator (Guided)

Your Task: Calculate order book imbalance - the ratio of bid volume to ask volume.

Imbalance = (Bid Volume - Ask Volume) / (Bid Volume + Ask Volume)

Returns value between -1 (all asks) and +1 (all bids).

Fill in the blanks to complete the function:

Exercise
Click to reveal solution
def calculate_book_imbalance(book: LimitOrderBook, levels: int = 3) -> float:
    """Calculate order book imbalance from top levels."""
    state = book.get_book_state(levels=levels)

    bid_volume = sum(qty for _, qty in state['bids'])
    ask_volume = sum(qty for _, qty in state['asks'])

    total_volume = bid_volume + ask_volume

    if total_volume == 0:
        return 0

    imbalance = (bid_volume - ask_volume) / total_volume

    return imbalance

# Test
imbalance = calculate_book_imbalance(book, levels=3)
print(f"Book Imbalance (top 3 levels): {imbalance:.2%}")

if imbalance > 0.1:
    print("Interpretation: More bid volume - bullish pressure")
elif imbalance < -0.1:
    print("Interpretation: More ask volume - bearish pressure")
else:
    print("Interpretation: Balanced book")

Section 15.2: Bid-Ask Spread Analysis

The bid-ask spread is the most fundamental transaction cost. Understanding its components helps predict trading costs.

In this section, you will learn: - Spread components (order processing, inventory, adverse selection) - Types of spreads (quoted, effective, realized) - Roll model for spread estimation

Spread Components

The spread compensates market makers for:

  1. Order Processing Costs - Fixed costs of maintaining systems
  2. Inventory Risk - Risk of holding inventory that may lose value
  3. Adverse Selection - Risk of trading with informed traders

Types of Spreads

  • Quoted Spread: Best ask - Best bid
  • Effective Spread: 2 × |Trade price - Midpoint|
  • Realized Spread: Effective spread minus subsequent price change
class SpreadAnalyzer:
    """Analyzes bid-ask spread characteristics."""
    
    def __init__(self):
        self.quotes = []
        self.trades = []
    
    def add_quote(self, timestamp: datetime, bid: float, ask: float):
        """Record a quote update."""
        self.quotes.append({
            'timestamp': timestamp,
            'bid': bid,
            'ask': ask,
            'midpoint': (bid + ask) / 2,
            'quoted_spread': ask - bid,
            'quoted_spread_bps': (ask - bid) / ((bid + ask) / 2) * 10000
        })
    
    def add_trade(self, timestamp: datetime, price: float, side: str):
        """Record a trade."""
        quote_at_trade = None
        for q in reversed(self.quotes):
            if q['timestamp'] <= timestamp:
                quote_at_trade = q
                break
        
        if quote_at_trade:
            midpoint = quote_at_trade['midpoint']
            effective_half_spread = abs(price - midpoint)
            effective_spread = 2 * effective_half_spread
            
            self.trades.append({
                'timestamp': timestamp,
                'price': price,
                'side': side,
                'midpoint_at_trade': midpoint,
                'effective_spread': effective_spread,
                'effective_spread_bps': effective_spread / midpoint * 10000
            })
    
    def summary_stats(self) -> Dict:
        """Get summary statistics."""
        if not self.quotes:
            return None
        
        quoted_spreads = [q['quoted_spread_bps'] for q in self.quotes]
        
        stats = {
            'num_quotes': len(self.quotes),
            'avg_quoted_spread_bps': np.mean(quoted_spreads),
            'median_quoted_spread_bps': np.median(quoted_spreads),
            'min_quoted_spread_bps': np.min(quoted_spreads),
            'max_quoted_spread_bps': np.max(quoted_spreads),
        }
        
        if self.trades:
            effective_spreads = [t['effective_spread_bps'] for t in self.trades]
            stats['num_trades'] = len(self.trades)
            stats['avg_effective_spread_bps'] = np.mean(effective_spreads)
        
        return stats
# Simulate quote and trade data
np.random.seed(42)
analyzer = SpreadAnalyzer()

base_time = datetime.now()
base_mid = 100.0
base_spread = 0.05

for i in range(100):
    timestamp = base_time + timedelta(seconds=i*10)
    base_mid += np.random.normal(0, 0.02)
    spread = max(0.01, base_spread + np.random.normal(0, 0.01))
    
    bid = base_mid - spread/2
    ask = base_mid + spread/2
    analyzer.add_quote(timestamp, bid, ask)
    
    if np.random.random() < 0.3:
        side = 'buy' if np.random.random() < 0.5 else 'sell'
        price = ask + np.random.uniform(0, 0.01) if side == 'buy' else bid - np.random.uniform(0, 0.01)
        analyzer.add_trade(timestamp, price, side)

stats = analyzer.summary_stats()
print("Spread Analysis Summary")
print("=" * 40)
for key, value in stats.items():
    if 'bps' in key:
        print(f"{key}: {value:.2f}")
    else:
        print(f"{key}: {value}")
# Visualize spread over time
df_quotes = pd.DataFrame(analyzer.quotes)

fig, axes = plt.subplots(2, 1, figsize=(12, 6), sharex=True)

axes[0].fill_between(df_quotes['timestamp'], df_quotes['bid'], df_quotes['ask'], 
                     alpha=0.3, label='Bid-Ask Range')
axes[0].plot(df_quotes['timestamp'], df_quotes['midpoint'], 
             label='Midpoint', color='blue', linewidth=1)
axes[0].set_ylabel('Price ($)')
axes[0].legend()
axes[0].set_title('Quote Evolution')

axes[1].plot(df_quotes['timestamp'], df_quotes['quoted_spread_bps'], color='red', linewidth=1)
axes[1].axhline(df_quotes['quoted_spread_bps'].mean(), color='red', 
                linestyle='--', label=f"Mean: {df_quotes['quoted_spread_bps'].mean():.1f} bps")
axes[1].set_ylabel('Spread (bps)')
axes[1].set_xlabel('Time')
axes[1].legend()
axes[1].set_title('Quoted Spread Over Time')

plt.tight_layout()
plt.show()

Exercise 15.2: Roll Spread Estimator (Guided)

Your Task: Estimate the bid-ask spread using the Roll (1984) model.

The Roll model estimates spread from the autocovariance of price changes:

$$\text{Spread} = 2\sqrt{-\text{Cov}(\Delta P_t, \Delta P_{t-1})}$$

(Only valid when covariance is negative)

Fill in the blanks:

Exercise
Click to reveal solution
def estimate_spread_roll(prices: np.ndarray) -> Optional[float]:
    """Estimate bid-ask spread using Roll (1984) model."""
    prices = np.array(prices)

    # Calculate price changes
    delta_p = np.diff(prices)

    # Calculate autocovariance
    cov = np.cov(delta_p[1:], delta_p[:-1])[0, 1]

    # Roll model only applies when covariance is negative
    if cov >= 0:
        return None

    # Spread = 2 * sqrt(-cov)
    spread = 2 * np.sqrt(-cov)

    return spread

# Test with simulated trade prices
np.random.seed(123)
true_spread = 0.05
true_mid = 100.0

# Efficient price random walk
efficient_prices = [true_mid]
for _ in range(500):
    efficient_prices.append(efficient_prices[-1] + np.random.normal(0, 0.02))

# Transaction prices alternate bid/ask
transaction_prices = []
for eff_p in efficient_prices:
    if np.random.random() < 0.5:
        transaction_prices.append(eff_p + true_spread/2)
    else:
        transaction_prices.append(eff_p - true_spread/2)

estimated_spread = estimate_spread_roll(transaction_prices)
print(f"True spread: ${true_spread:.4f}")
print(f"Roll estimate: ${estimated_spread:.4f}" if estimated_spread else "Roll model not applicable")

Section 15.3: Price Impact Models

When you trade, you move the price. This price impact has two components:

  1. Temporary Impact: Immediate price pressure that reverses
  2. Permanent Impact: Information revealed by your trade

In this section, you will learn: - Square-root impact model - Almgren-Chriss framework - Impact estimation from trade data

The Square-Root Model

The most widely-used impact model:

$$\text{Impact} = \sigma \cdot \sqrt{\frac{Q}{V}}$$

Where: - $\sigma$ = daily volatility - $Q$ = trade quantity - $V$ = average daily volume

class PriceImpactModel:
    """Models price impact of trading."""
    
    def __init__(self, sigma: float = 0.02, avg_daily_volume: int = 1000000):
        self.sigma = sigma
        self.adv = avg_daily_volume
    
    def square_root_impact(self, quantity: int, permanent_fraction: float = 0.5) -> Dict:
        """Calculate impact using square-root model."""
        participation = quantity / self.adv
        
        total_impact = self.sigma * np.sqrt(participation)
        permanent = total_impact * permanent_fraction
        temporary = total_impact * (1 - permanent_fraction)
        
        return {
            'total_impact_pct': total_impact,
            'permanent_pct': permanent,
            'temporary_pct': temporary,
            'participation_rate': participation,
            'total_impact_bps': total_impact * 10000,
            'permanent_bps': permanent * 10000,
            'temporary_bps': temporary * 10000
        }
    
    def almgren_chriss_cost(self, quantity: int, time_horizon: float, 
                            risk_aversion: float = 1e-6) -> Dict:
        """Calculate expected cost using Almgren-Chriss model."""
        eta = 0.01 * self.sigma / np.sqrt(self.adv)
        gamma = 0.1 * eta
        
        kappa = np.sqrt(risk_aversion * self.sigma**2 / eta)
        
        permanent_cost = 0.5 * gamma * quantity**2
        temporary_cost = eta * quantity**2 / (2 * time_horizon)
        timing_risk = 0.5 * risk_aversion * self.sigma**2 * quantity**2 * time_horizon
        
        total_cost = permanent_cost + temporary_cost + timing_risk
        
        return {
            'permanent_cost': permanent_cost,
            'temporary_cost': temporary_cost,
            'timing_risk_cost': timing_risk,
            'total_expected_cost': total_cost,
            'cost_per_share': total_cost / quantity,
            'optimal_kappa': kappa
        }
# Example: Impact of different trade sizes
model = PriceImpactModel(sigma=0.02, avg_daily_volume=1_000_000)

print("Price Impact Analysis")
print("=" * 60)
print(f"Stock: $100, Daily Vol: 1M shares, Volatility: 2%")
print()
print(f"{'Trade Size':>12} {'% ADV':>8} {'Impact (bps)':>14} {'Perm (bps)':>12} {'Temp (bps)':>12}")
print("-" * 60)

for qty in [1000, 5000, 10000, 50000, 100000, 500000]:
    impact = model.square_root_impact(qty)
    print(f"{qty:>12,} {impact['participation_rate']:>7.1%} {impact['total_impact_bps']:>14.1f} "
          f"{impact['permanent_bps']:>12.1f} {impact['temporary_bps']:>12.1f}")
# Visualize impact curve
quantities = np.linspace(1000, 500000, 100)
impacts = [model.square_root_impact(q)['total_impact_bps'] for q in quantities]
participation_rates = [q / model.adv * 100 for q in quantities]

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].plot(quantities / 1000, impacts, linewidth=2)
axes[0].set_xlabel('Trade Size (thousands of shares)')
axes[0].set_ylabel('Expected Impact (bps)')
axes[0].set_title('Price Impact vs Trade Size')
axes[0].grid(True, alpha=0.3)

axes[1].plot(participation_rates, impacts, linewidth=2, color='red')
axes[1].set_xlabel('Participation Rate (% of ADV)')
axes[1].set_ylabel('Expected Impact (bps)')
axes[1].set_title('Price Impact vs Participation Rate')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nKey insight: Impact grows with square root of trade size")
print("Trading 4x the volume only doubles the impact")

Exercise 15.3: Optimal Trade Horizon (Guided)

Your Task: Calculate the optimal execution horizon given trade size and urgency.

The trade-off is: - Trade faster → higher market impact - Trade slower → more timing/volatility risk

Fill in the blanks:

Exercise
Click to reveal solution
def optimal_trade_horizon(quantity: int, adv: int, volatility: float, 
                          urgency: float = 1.0) -> float:
    """
    Calculate optimal execution horizon.

    Returns horizon in trading days.
    """
    participation = quantity / adv

    base_horizon = np.sqrt(participation) * 2

    vol_adjustment = 0.02 / volatility

    urgency_adjustment = 1 / urgency

    optimal_horizon = base_horizon * vol_adjustment * urgency_adjustment

    optimal_horizon = np.clip(optimal_horizon, 0.1, 5.0)

    return optimal_horizon

# Test with different scenarios
print("Optimal Trade Horizons")
print("=" * 60)

scenarios = [
    (50000, 1000000, 0.02, 1.0, "Base case"),
    (50000, 1000000, 0.02, 2.0, "High urgency"),
    (50000, 1000000, 0.04, 1.0, "High volatility"),
    (200000, 1000000, 0.02, 1.0, "Large order"),
    (50000, 5000000, 0.02, 1.0, "Liquid stock"),
]

for qty, adv, vol, urg, desc in scenarios:
    horizon = optimal_trade_horizon(qty, adv, vol, urg)
    print(f"{desc}:")
    print(f"  Order: {qty:,} shares ({qty/adv:.1%} of ADV)")
    print(f"  Optimal horizon: {horizon:.2f} days ({horizon*6.5:.1f} hours)")
    print()

Section 15.4: Optimal Execution

Given price impact, how should we optimally execute large orders?

In this section, you will learn: - TWAP (Time-Weighted Average Price) - VWAP (Volume-Weighted Average Price) - Almgren-Chriss optimal trajectory - Implementation shortfall algorithms

Key Algorithms

Algorithm Strategy Best For
TWAP Equal slices over time Low-urgency, uniform volume
VWAP Match market volume profile Benchmark matching
Implementation Shortfall Minimize expected shortfall Alpha-decay situations
Participation Fixed % of market volume Large orders, patient
class ExecutionAlgorithm:
    """Implementation of common execution algorithms."""
    
    def __init__(self, total_quantity: int, time_horizon: float, num_slices: int = 20):
        self.total_quantity = total_quantity
        self.time_horizon = time_horizon
        self.num_slices = num_slices
        self.times = np.linspace(0, time_horizon, num_slices + 1)
    
    def twap(self) -> Dict:
        """Time-Weighted Average Price - equal amounts at regular intervals."""
        slice_qty = self.total_quantity // self.num_slices
        remainder = self.total_quantity % self.num_slices
        
        quantities = [slice_qty] * self.num_slices
        quantities[-1] += remainder
        
        return {
            'name': 'TWAP',
            'times': self.times[1:],
            'quantities': quantities,
            'cumulative': np.cumsum(quantities)
        }
    
    def vwap(self, volume_profile: np.ndarray = None) -> Dict:
        """Volume-Weighted Average Price - proportional to expected volume."""
        if volume_profile is None:
            volume_profile = self._default_volume_profile()
        
        if len(volume_profile) != self.num_slices:
            volume_profile = np.interp(
                np.linspace(0, 1, self.num_slices),
                np.linspace(0, 1, len(volume_profile)),
                volume_profile
            )
        
        volume_profile = np.array(volume_profile)
        volume_pct = volume_profile / volume_profile.sum()
        quantities = np.round(self.total_quantity * volume_pct).astype(int)
        
        diff = self.total_quantity - quantities.sum()
        quantities[-1] += diff
        
        return {
            'name': 'VWAP',
            'times': self.times[1:],
            'quantities': list(quantities),
            'cumulative': np.cumsum(quantities)
        }
    
    def almgren_chriss(self, risk_aversion: float = 1e-6, sigma: float = 0.02) -> Dict:
        """Almgren-Chriss optimal execution trajectory."""
        eta = 0.01
        kappa = np.sqrt(risk_aversion * sigma**2 / eta)
        
        T = self.time_horizon
        positions = []
        
        for t in self.times:
            if t >= T:
                pos = 0
            else:
                pos = self.total_quantity * np.sinh(kappa * (T - t)) / np.sinh(kappa * T)
            positions.append(pos)
        
        positions = np.array(positions)
        quantities = -np.diff(positions)
        quantities = np.round(quantities).astype(int)
        
        diff = self.total_quantity - quantities.sum()
        quantities[-1] += diff
        
        return {
            'name': 'Almgren-Chriss',
            'times': self.times[1:],
            'quantities': list(quantities),
            'cumulative': np.cumsum(quantities),
            'kappa': kappa
        }
    
    def _default_volume_profile(self) -> np.ndarray:
        """Default U-shaped intraday volume profile."""
        x = np.linspace(0, 1, self.num_slices)
        profile = 1 + 2 * (x - 0.5)**2
        return profile
# Compare execution algorithms
algo = ExecutionAlgorithm(total_quantity=100000, time_horizon=1.0, num_slices=20)

twap = algo.twap()
vwap = algo.vwap()
ac_low = algo.almgren_chriss(risk_aversion=1e-7)
ac_high = algo.almgren_chriss(risk_aversion=1e-5)

print("Execution Algorithm Comparison")
print("=" * 50)
print(f"Order: 100,000 shares over 1 day ({algo.num_slices} slices)")
# Visualize execution trajectories
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for strategy, color, ls in [(twap, 'blue', '-'), (vwap, 'green', '-'), 
                             (ac_low, 'red', '--'), (ac_high, 'orange', '--')]:
    label = strategy['name']
    if 'Almgren' in label:
        label += f" (kappa={strategy['kappa']:.1f})"
    axes[0].plot(strategy['times'], strategy['cumulative'], 
                 label=label, color=color, linestyle=ls, linewidth=2)

axes[0].set_xlabel('Time (days)')
axes[0].set_ylabel('Cumulative Shares Executed')
axes[0].set_title('Execution Trajectories')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

width = 0.02
x = np.array(twap['times'])

axes[1].bar(x - 1.5*width, twap['quantities'], width, label='TWAP', alpha=0.7)
axes[1].bar(x - 0.5*width, vwap['quantities'], width, label='VWAP', alpha=0.7)
axes[1].bar(x + 0.5*width, ac_low['quantities'], width, label='AC (Patient)', alpha=0.7)
axes[1].bar(x + 1.5*width, ac_high['quantities'], width, label='AC (Urgent)', alpha=0.7)

axes[1].set_xlabel('Time (days)')
axes[1].set_ylabel('Shares per Slice')
axes[1].set_title('Execution Rate by Time')
axes[1].legend()

plt.tight_layout()
plt.show()

Exercise 15.4: Implementation Shortfall Schedule (Open-ended)

Your Task:

Build a function that generates an execution schedule to minimize implementation shortfall (alpha decay).

The function should: - Front-load execution to capture alpha before it decays - Use exponential decay weighting - Return a list of quantities for each slice - Higher alpha_decay_rate = more aggressive front-loading

Your implementation:

Exercise
Click to reveal solution
def implementation_shortfall_schedule(total_quantity: int, num_slices: int, 
                                       alpha_decay_rate: float = 0.1) -> List[int]:
    """
    Generate execution schedule that minimizes implementation shortfall.

    Uses exponential decay weighting - trade more early when alpha
    is strongest, less as alpha decays.
    """
    # Exponential decay weights
    times = np.arange(num_slices)
    weights = np.exp(-alpha_decay_rate * times)

    # Normalize
    weights = weights / weights.sum()

    # Allocate quantities
    quantities = np.round(total_quantity * weights).astype(int)

    # Adjust for rounding
    diff = total_quantity - quantities.sum()
    quantities[0] += diff

    return list(quantities)

# Compare IS schedules with different decay rates
total_qty = 100000
num_slices = 20

schedules = {
    'TWAP (baseline)': [total_qty // num_slices] * num_slices,
    'IS (slow decay)': implementation_shortfall_schedule(total_qty, num_slices, 0.05),
    'IS (medium decay)': implementation_shortfall_schedule(total_qty, num_slices, 0.15),
    'IS (fast decay)': implementation_shortfall_schedule(total_qty, num_slices, 0.30),
}

# Visualize
fig, ax = plt.subplots(figsize=(12, 5))
x = np.arange(num_slices)
width = 0.2

for i, (name, schedule) in enumerate(schedules.items()):
    ax.bar(x + i*width, schedule, width, label=name, alpha=0.7)

ax.set_xlabel('Time Slice')
ax.set_ylabel('Shares per Slice')
ax.set_title('Implementation Shortfall Algorithm - Front-Loading Comparison')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print("\nFirst 5 slices for each strategy:")
for name, schedule in schedules.items():
    first_five_pct = sum(schedule[:5]) / sum(schedule) * 100
    print(f"{name}: {schedule[:5]} ({first_five_pct:.1f}% in first 25% of time)")

Exercise 15.5: VWAP Tracker (Open-ended)

Your Task:

Build a class that tracks VWAP execution performance in real-time.

The class should: - Track actual executions vs planned VWAP schedule - Calculate slippage vs VWAP benchmark - Provide metrics on execution quality - Handle partial fills and timing deviations

Your implementation:

Exercise
Click to reveal solution
class VWAPTracker:
    """Tracks VWAP execution performance in real-time."""

    def __init__(self, target_quantity: int, planned_schedule: List[int], 
                 benchmark_vwap: float):
        self.target_quantity = target_quantity
        self.planned_schedule = planned_schedule
        self.benchmark_vwap = benchmark_vwap

        self.fills = []
        self.total_executed = 0
        self.total_cost = 0.0

    def record_fill(self, timestamp: datetime, quantity: int, price: float):
        """Record an execution fill."""
        self.fills.append({
            'timestamp': timestamp,
            'quantity': quantity,
            'price': price,
            'cost': quantity * price
        })
        self.total_executed += quantity
        self.total_cost += quantity * price

    def get_execution_stats(self) -> Dict:
        """Get execution quality statistics."""
        if self.total_executed == 0:
            return {'error': 'No executions recorded'}

        actual_vwap = self.total_cost / self.total_executed
        slippage = actual_vwap - self.benchmark_vwap
        slippage_bps = slippage / self.benchmark_vwap * 10000
        completion_rate = self.total_executed / self.target_quantity

        return {
            'actual_vwap': actual_vwap,
            'benchmark_vwap': self.benchmark_vwap,
            'slippage': slippage,
            'slippage_bps': slippage_bps,
            'total_executed': self.total_executed,
            'target_quantity': self.target_quantity,
            'completion_rate': completion_rate,
            'num_fills': len(self.fills)
        }

# Test the tracker
tracker = VWAPTracker(
    target_quantity=10000,
    planned_schedule=[2000, 1500, 1500, 2000, 3000],
    benchmark_vwap=100.50
)

# Simulate fills with some slippage
np.random.seed(42)
base_time = datetime.now()

fills = [
    (2000, 100.52),
    (1500, 100.48),
    (1500, 100.55),
    (2000, 100.51),
    (3000, 100.53)
]

for i, (qty, price) in enumerate(fills):
    timestamp = base_time + timedelta(minutes=i*30)
    tracker.record_fill(timestamp, qty, price)

stats = tracker.get_execution_stats()
print("VWAP Execution Report")
print("=" * 40)
print(f"Benchmark VWAP: ${stats['benchmark_vwap']:.4f}")
print(f"Actual VWAP: ${stats['actual_vwap']:.4f}")
print(f"Slippage: {stats['slippage_bps']:.2f} bps")
print(f"Completion: {stats['completion_rate']:.1%}")

Exercise 15.6: Complete Microstructure Analyzer (Open-ended)

Your Task:

Build a comprehensive MicrostructureAnalyzer class that combines all the concepts.

The class should: - Maintain an order book and spread analyzer - Simulate market activity with random orders/trades - Calculate comprehensive metrics (spreads, imbalance, impact) - Compare execution algorithms - Generate a formatted report

Your implementation:

Exercise
Click to reveal solution
class MicrostructureAnalyzer:
    """Comprehensive market microstructure analysis tool."""

    def __init__(self, symbol: str = 'SAMPLE', tick_size: float = 0.01):
        self.symbol = symbol
        self.tick_size = tick_size

        self.order_book = LimitOrderBook(tick_size=tick_size)
        self.spread_analyzer = SpreadAnalyzer()
        self.impact_model = None

        self.quote_history = []
        self.trade_history = []
        self.metrics = {}

    def simulate_market(self, num_events: int = 1000, initial_price: float = 100.0,
                        volatility: float = 0.02, avg_daily_volume: int = 1000000):
        """Simulate market activity."""
        np.random.seed(42)
        self.impact_model = PriceImpactModel(sigma=volatility, avg_daily_volume=avg_daily_volume)

        current_mid = initial_price
        spread = 0.05

        # Populate initial book
        for i in range(5):
            bid = current_mid - spread/2 - i * self.tick_size
            ask = current_mid + spread/2 + i * self.tick_size
            qty = np.random.randint(100, 1000)
            self.order_book.add_order(OrderSide.BUY, OrderType.LIMIT, qty, bid)
            self.order_book.add_order(OrderSide.SELL, OrderType.LIMIT, qty, ask)

        base_time = datetime.now()

        for i in range(num_events):
            timestamp = base_time + timedelta(seconds=i)
            current_mid += np.random.normal(0, volatility/100)

            bid = self.order_book.best_bid()
            ask = self.order_book.best_ask()
            if bid and ask:
                self.spread_analyzer.add_quote(timestamp, bid, ask)
                self.quote_history.append({'timestamp': timestamp, 'bid': bid, 'ask': ask})

            event = np.random.choice(['limit', 'market', 'cancel'], p=[0.6, 0.25, 0.15])

            if event == 'limit':
                side = OrderSide.BUY if np.random.random() < 0.5 else OrderSide.SELL
                qty = np.random.randint(50, 500)
                price = current_mid - np.random.uniform(0, spread) if side == OrderSide.BUY else current_mid + np.random.uniform(0, spread)
                self.order_book.add_order(side, OrderType.LIMIT, qty, price)
            elif event == 'market':
                side = OrderSide.BUY if np.random.random() < 0.5 else OrderSide.SELL
                qty = np.random.randint(50, 300)
                order, trades = self.order_book.add_order(side, OrderType.MARKET, qty)
                for trade in trades:
                    self.spread_analyzer.add_trade(timestamp, trade.price, side.value)
                    self.trade_history.append({'timestamp': timestamp, 'price': trade.price, 'quantity': trade.quantity, 'side': side.value})
            else:
                if self.order_book.orders:
                    order_id = np.random.choice(list(self.order_book.orders.keys()))
                    self.order_book.cancel_order(order_id)

        print(f"Simulated {num_events} events: {len(self.quote_history)} quotes, {len(self.trade_history)} trades")

    def calculate_metrics(self) -> Dict:
        """Calculate comprehensive metrics."""
        spread_stats = self.spread_analyzer.summary_stats()
        imbalance = calculate_book_imbalance(self.order_book, levels=3)

        if self.trade_history:
            df_trades = pd.DataFrame(self.trade_history)
            buy_volume = df_trades[df_trades['side'] == 'buy']['quantity'].sum()
            sell_volume = df_trades[df_trades['side'] == 'sell']['quantity'].sum()
            order_flow_imbalance = (buy_volume - sell_volume) / (buy_volume + sell_volume) if (buy_volume + sell_volume) > 0 else 0
            avg_trade_size = df_trades['quantity'].mean()
            prices = df_trades['price'].values
            roll_spread = estimate_spread_roll(prices)
        else:
            order_flow_imbalance = 0
            avg_trade_size = 0
            roll_spread = None

        self.metrics = {
            'spread_stats': spread_stats,
            'book_imbalance': imbalance,
            'order_flow_imbalance': order_flow_imbalance,
            'avg_trade_size': avg_trade_size,
            'roll_spread_estimate': roll_spread
        }
        return self.metrics

    def analyze_execution(self, quantity: int) -> Dict:
        """Compare execution strategies."""
        algo = ExecutionAlgorithm(total_quantity=quantity, time_horizon=1.0)
        results = {
            'TWAP': algo.twap(),
            'VWAP': algo.vwap(),
            'Almgren-Chriss': algo.almgren_chriss()
        }

        if self.impact_model:
            for name, schedule in results.items():
                total_impact = sum(
                    self.impact_model.square_root_impact(qty)['total_impact_bps'] * qty
                    for qty in schedule['quantities']
                )
                schedule['estimated_cost_bps'] = total_impact / quantity

        return results

    def generate_report(self) -> str:
        """Generate formatted report."""
        if not self.metrics:
            self.calculate_metrics()

        lines = [
            "=" * 60,
            f"MICROSTRUCTURE ANALYSIS REPORT - {self.symbol}",
            "=" * 60,
            "",
            "SPREAD ANALYSIS",
            "-" * 40
        ]

        ss = self.metrics.get('spread_stats', {})
        if ss:
            lines.append(f"Average Quoted Spread: {ss.get('avg_quoted_spread_bps', 0):.2f} bps")
        if self.metrics.get('roll_spread_estimate'):
            lines.append(f"Roll Spread Estimate: ${self.metrics['roll_spread_estimate']:.4f}")

        lines.extend([
            "",
            "ORDER BOOK STATE",
            "-" * 40,
            f"Book Imbalance: {self.metrics.get('book_imbalance', 0):.2%}",
            f"Best Bid: ${self.order_book.best_bid():.2f}" if self.order_book.best_bid() else "Best Bid: N/A",
            f"Best Ask: ${self.order_book.best_ask():.2f}" if self.order_book.best_ask() else "Best Ask: N/A",
            "",
            "TRADING ACTIVITY",
            "-" * 40,
            f"Total Trades: {len(self.trade_history)}",
            f"Average Trade Size: {self.metrics.get('avg_trade_size', 0):.0f} shares",
            f"Order Flow Imbalance: {self.metrics.get('order_flow_imbalance', 0):.2%}",
            "",
            "=" * 60
        ])

        return "\n".join(lines)

# Run the analyzer
analyzer = MicrostructureAnalyzer(symbol='DEMO')
analyzer.simulate_market(num_events=500, initial_price=100.0)
metrics = analyzer.calculate_metrics()
print(analyzer.generate_report())

# Analyze execution
print("\nExecution Analysis for 10,000 shares:")
exec_results = analyzer.analyze_execution(10000)
for name, result in exec_results.items():
    cost = result.get('estimated_cost_bps', 'N/A')
    print(f"  {name}: {cost:.2f} bps estimated impact" if isinstance(cost, float) else f"  {name}: {cost}")

Module Project: Production Microstructure System

Put together everything you've learned to build a comprehensive microstructure analysis system.

# YOUR CODE HERE - Module Project
# Build a complete microstructure analysis system that:
# 1. Simulates realistic order book activity
# 2. Calculates spread metrics (quoted, effective, Roll estimate)
# 3. Models price impact for different order sizes
# 4. Compares execution algorithms (TWAP, VWAP, AC, IS)
# 5. Generates a comprehensive analysis report
Click to reveal solution
class ProductionMicrostructureSystem:
    """
    Complete microstructure analysis system for production use.

    Features:
    - Order book simulation and analysis
    - Spread decomposition (quoted, effective, Roll)
    - Price impact estimation
    - Execution algorithm comparison
    - Comprehensive reporting
    """

    def __init__(self, symbol: str, tick_size: float = 0.01):
        self.symbol = symbol
        self.tick_size = tick_size

        # Core components
        self.order_book = LimitOrderBook(tick_size=tick_size)
        self.spread_analyzer = SpreadAnalyzer()
        self.impact_model = None

        # Data storage
        self.quote_history = []
        self.trade_history = []
        self.metrics = {}
        self.execution_analysis = {}

    def initialize_market(self, initial_price: float = 100.0, 
                          volatility: float = 0.02,
                          avg_daily_volume: int = 1_000_000):
        """Initialize market parameters."""
        self.initial_price = initial_price
        self.volatility = volatility
        self.adv = avg_daily_volume
        self.impact_model = PriceImpactModel(sigma=volatility, avg_daily_volume=avg_daily_volume)

        # Build initial book
        spread = 0.05
        for i in range(5):
            bid = initial_price - spread/2 - i * self.tick_size
            ask = initial_price + spread/2 + i * self.tick_size
            qty = np.random.randint(200, 1000)
            self.order_book.add_order(OrderSide.BUY, OrderType.LIMIT, qty, bid)
            self.order_book.add_order(OrderSide.SELL, OrderType.LIMIT, qty, ask)

    def simulate_trading_day(self, num_events: int = 1000):
        """Simulate a full trading day."""
        np.random.seed(42)
        current_mid = self.initial_price
        base_time = datetime.now()
        spread = 0.05

        for i in range(num_events):
            timestamp = base_time + timedelta(seconds=i * 23.4)  # ~6.5 hours
            current_mid += np.random.normal(0, self.volatility / 100)

            # Record quote
            bid = self.order_book.best_bid()
            ask = self.order_book.best_ask()
            if bid and ask:
                self.spread_analyzer.add_quote(timestamp, bid, ask)
                self.quote_history.append({
                    'timestamp': timestamp, 'bid': bid, 'ask': ask,
                    'midpoint': (bid + ask) / 2
                })

            # Random event
            event = np.random.choice(['limit', 'market', 'cancel'], p=[0.6, 0.25, 0.15])

            if event == 'limit':
                side = OrderSide.BUY if np.random.random() < 0.5 else OrderSide.SELL
                qty = np.random.randint(50, 500)
                offset = np.random.uniform(0, spread)
                price = current_mid - offset if side == OrderSide.BUY else current_mid + offset
                self.order_book.add_order(side, OrderType.LIMIT, qty, price)

            elif event == 'market':
                side = OrderSide.BUY if np.random.random() < 0.5 else OrderSide.SELL
                qty = np.random.randint(50, 300)
                order, trades = self.order_book.add_order(side, OrderType.MARKET, qty)

                for trade in trades:
                    self.spread_analyzer.add_trade(timestamp, trade.price, side.value)
                    self.trade_history.append({
                        'timestamp': timestamp,
                        'price': trade.price,
                        'quantity': trade.quantity,
                        'side': side.value
                    })
            else:
                if self.order_book.orders:
                    order_id = np.random.choice(list(self.order_book.orders.keys()))
                    self.order_book.cancel_order(order_id)

    def calculate_all_metrics(self) -> Dict:
        """Calculate comprehensive metrics."""
        # Spread metrics
        spread_stats = self.spread_analyzer.summary_stats()

        # Book imbalance
        book_imbalance = calculate_book_imbalance(self.order_book, levels=3)

        # Trade metrics
        if self.trade_history:
            df_trades = pd.DataFrame(self.trade_history)
            buy_vol = df_trades[df_trades['side'] == 'buy']['quantity'].sum()
            sell_vol = df_trades[df_trades['side'] == 'sell']['quantity'].sum()
            total_vol = buy_vol + sell_vol

            order_flow_imbalance = (buy_vol - sell_vol) / total_vol if total_vol > 0 else 0
            avg_trade_size = df_trades['quantity'].mean()

            # Roll spread
            prices = df_trades['price'].values
            roll_spread = estimate_spread_roll(prices)
        else:
            order_flow_imbalance = 0
            avg_trade_size = 0
            roll_spread = None

        self.metrics = {
            'spread': spread_stats,
            'book_imbalance': book_imbalance,
            'order_flow_imbalance': order_flow_imbalance,
            'avg_trade_size': avg_trade_size,
            'roll_spread': roll_spread,
            'num_quotes': len(self.quote_history),
            'num_trades': len(self.trade_history)
        }

        return self.metrics

    def analyze_execution_strategies(self, order_sizes: List[int]) -> Dict:
        """Analyze execution strategies for various order sizes."""
        results = {}

        for size in order_sizes:
            algo = ExecutionAlgorithm(total_quantity=size, time_horizon=1.0)

            strategies = {
                'TWAP': algo.twap(),
                'VWAP': algo.vwap(),
                'Almgren-Chriss': algo.almgren_chriss()
            }

            for name, schedule in strategies.items():
                if self.impact_model:
                    total_impact = sum(
                        self.impact_model.square_root_impact(qty)['total_impact_bps'] * qty
                        for qty in schedule['quantities']
                    )
                    schedule['estimated_cost_bps'] = total_impact / size

            results[size] = strategies

        self.execution_analysis = results
        return results

    def generate_full_report(self) -> str:
        """Generate comprehensive analysis report."""
        if not self.metrics:
            self.calculate_all_metrics()

        lines = [
            "=" * 70,
            f"MICROSTRUCTURE ANALYSIS REPORT",
            f"Symbol: {self.symbol}",
            f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
            "=" * 70,
            "",
            "1. MARKET OVERVIEW",
            "-" * 50,
            f"   Initial Price: ${self.initial_price:.2f}",
            f"   Volatility: {self.volatility:.1%}",
            f"   Avg Daily Volume: {self.adv:,}",
            f"   Quotes Recorded: {self.metrics['num_quotes']:,}",
            f"   Trades Executed: {self.metrics['num_trades']:,}",
            "",
            "2. SPREAD ANALYSIS",
            "-" * 50
        ]

        ss = self.metrics.get('spread', {})
        if ss:
            lines.extend([
                f"   Avg Quoted Spread: {ss.get('avg_quoted_spread_bps', 0):.2f} bps",
                f"   Median Quoted Spread: {ss.get('median_quoted_spread_bps', 0):.2f} bps",
                f"   Spread Range: {ss.get('min_quoted_spread_bps', 0):.2f} - {ss.get('max_quoted_spread_bps', 0):.2f} bps"
            ])

        if ss and 'avg_effective_spread_bps' in ss:
            lines.append(f"   Avg Effective Spread: {ss['avg_effective_spread_bps']:.2f} bps")

        if self.metrics.get('roll_spread'):
            lines.append(f"   Roll Model Estimate: ${self.metrics['roll_spread']:.4f}")

        lines.extend([
            "",
            "3. ORDER BOOK STATE",
            "-" * 50,
            f"   Best Bid: ${self.order_book.best_bid():.2f}" if self.order_book.best_bid() else "   Best Bid: N/A",
            f"   Best Ask: ${self.order_book.best_ask():.2f}" if self.order_book.best_ask() else "   Best Ask: N/A",
            f"   Current Spread: ${self.order_book.spread():.2f}" if self.order_book.spread() else "   Current Spread: N/A",
            f"   Book Imbalance: {self.metrics['book_imbalance']:.2%}",
            "",
            "4. TRADING ACTIVITY",
            "-" * 50,
            f"   Average Trade Size: {self.metrics['avg_trade_size']:.0f} shares",
            f"   Order Flow Imbalance: {self.metrics['order_flow_imbalance']:.2%}",
        ])

        if self.execution_analysis:
            lines.extend([
                "",
                "5. EXECUTION ANALYSIS",
                "-" * 50
            ])

            for size, strategies in self.execution_analysis.items():
                lines.append(f"\n   Order Size: {size:,} shares ({size/self.adv:.1%} of ADV)")
                for name, result in strategies.items():
                    cost = result.get('estimated_cost_bps', 'N/A')
                    if isinstance(cost, float):
                        lines.append(f"      {name}: {cost:.2f} bps")

        lines.extend([
            "",
            "=" * 70,
            "END OF REPORT",
            "=" * 70
        ])

        return "\n".join(lines)


# Run complete analysis
system = ProductionMicrostructureSystem(symbol='AAPL')
system.initialize_market(initial_price=175.0, volatility=0.025, avg_daily_volume=50_000_000)
system.simulate_trading_day(num_events=1000)
system.calculate_all_metrics()
system.analyze_execution_strategies([10000, 50000, 100000, 500000])

print(system.generate_full_report())
system.order_book.display()

Key Takeaways

What You Learned

1. Order Book Mechanics

  • Limit order books match orders by price-time priority
  • Market orders provide immediate execution but pay the spread
  • Book imbalance can predict short-term price direction

2. Bid-Ask Spread

  • Compensates market makers for inventory risk and adverse selection
  • Effective spread often differs from quoted spread
  • Roll model estimates spread from price autocovariance

3. Price Impact

  • Grows with square root of trade size (not linearly)
  • Has permanent (information) and temporary (pressure) components
  • Key input for execution optimization

4. Optimal Execution

  • TWAP: Simple, equal slices over time
  • VWAP: Match market volume profile
  • Almgren-Chriss: Optimal risk-cost tradeoff
  • Implementation Shortfall: Front-load when alpha decays

Best Practices

  1. Understand your stock's typical spread and impact
  2. Size orders relative to ADV (< 10% participation is typical)
  3. Choose algorithm based on urgency and information
  4. Monitor execution quality vs benchmarks

Coming Up Next

In Module 16: High-Frequency Concepts, we'll explore: - Latency and co-location - HFT strategies and market making - Regulations and market structure


Congratulations on completing Module 15!

Module 16: High-Frequency Concepts

Course 3: Quantitative Finance & Portfolio Theory
Part 5: Production & Infrastructure


Learning Objectives

By the end of this module, you will be able to:

  1. Understand latency measurement and optimization concepts
  2. Calculate network latency based on distance and medium
  3. Analyze co-location infrastructure and ROI
  4. Implement basic HFT strategy simulations
Attribute Value
Duration ~2 hours
Exercises 6 (3 guided + 3 open-ended)
Prerequisites Module 15 (Market Microstructure)

Important Note: This module is educational. Building actual HFT systems requires significant capital, specialized infrastructure, and regulatory compliance.

Setup and Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Dict, Optional
from collections import deque
import time
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: f'{x:.4f}')
np.set_printoptions(precision=4)
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("Module 16: High-Frequency Concepts")
print("=" * 45)

Section 16.1: Latency and Speed

In HFT, latency is everything. A system that's 10 microseconds faster can capture opportunities others miss.

In this section, you will learn: - Types of latency in trading systems - Latency measurement techniques - Statistical analysis of latency distributions

What is Latency?

Latency is the time delay in a system. For trading, we care about:

  1. Market Data Latency: Time for price updates to reach your system
  2. Processing Latency: Time for your system to make a decision
  3. Order Latency: Time for your order to reach the exchange
  4. Round-Trip Latency: Total time from signal to execution confirmation

Latency Benchmarks

Component Typical Range HFT Target
Network (co-located) 1-10 μs < 5 μs
Network (remote) 1-100 ms N/A
Software processing 10-1000 μs < 10 μs
Exchange matching 5-50 μs -
Round-trip (co-lo) 20-100 μs < 50 μs
class LatencyMeasurement:
    """Tool for measuring and analyzing latency."""
    
    def __init__(self, name: str):
        self.name = name
        self.measurements = []
        self._start_time = None
    
    def start(self):
        """Start timing."""
        self._start_time = time.perf_counter_ns()
    
    def stop(self) -> int:
        """Stop timing and record measurement."""
        if self._start_time is None:
            raise ValueError("Timer not started")
        end_time = time.perf_counter_ns()
        latency_ns = end_time - self._start_time
        self.measurements.append(latency_ns)
        self._start_time = None
        return latency_ns
    
    def record(self, latency_ns: int):
        """Directly record a latency measurement."""
        self.measurements.append(latency_ns)
    
    def statistics(self) -> Dict:
        """Calculate latency statistics."""
        if not self.measurements:
            return {}
        arr = np.array(self.measurements)
        return {
            'name': self.name,
            'count': len(arr),
            'mean_ns': np.mean(arr),
            'mean_us': np.mean(arr) / 1000,
            'median_us': np.median(arr) / 1000,
            'std_ns': np.std(arr),
            'min_ns': np.min(arr),
            'max_ns': np.max(arr),
            'p50_ns': np.percentile(arr, 50),
            'p95_ns': np.percentile(arr, 95),
            'p99_ns': np.percentile(arr, 99),
            'p99_9_ns': np.percentile(arr, 99.9),
        }


class LatencyProfiler:
    """Profile latency across multiple components."""
    
    def __init__(self):
        self.components = {}
    
    def add_component(self, name: str) -> LatencyMeasurement:
        self.components[name] = LatencyMeasurement(name)
        return self.components[name]
    
    def get_component(self, name: str) -> LatencyMeasurement:
        return self.components.get(name)
    
    def summary(self) -> pd.DataFrame:
        rows = []
        for name, comp in self.components.items():
            stats = comp.statistics()
            if stats:
                rows.append({
                    'Component': name,
                    'Count': stats['count'],
                    'Mean (μs)': stats['mean_us'],
                    'Median (μs)': stats['median_us'],
                    'P95 (μs)': stats['p95_ns'] / 1000,
                    'P99 (μs)': stats['p99_ns'] / 1000,
                    'Max (μs)': stats['max_ns'] / 1000,
                })
        return pd.DataFrame(rows)
# Demonstrate latency measurement
print("Latency Measurement Demo")
print("=" * 50)

profiler = LatencyProfiler()

# Component 1: Dictionary lookup
dict_lookup = profiler.add_component("dict_lookup")
test_dict = {str(i): i for i in range(10000)}

for _ in range(1000):
    dict_lookup.start()
    _ = test_dict.get("5000")
    dict_lookup.stop()

# Component 2: List append
list_append = profiler.add_component("list_append")
test_list = []

for i in range(1000):
    list_append.start()
    test_list.append(i)
    list_append.stop()

# Component 3: NumPy operation
numpy_op = profiler.add_component("numpy_mean")
arr = np.random.randn(1000)

for _ in range(1000):
    numpy_op.start()
    _ = np.mean(arr)
    numpy_op.stop()

print(profiler.summary().to_string(index=False))
print("\nNote: These are Python operations - HFT systems use C++ for nanosecond-level operations")
# Simulate the impact of different latency levels on trading
class LatencySimulator:
    """Simulate how latency affects trading outcomes."""
    
    def __init__(self, opportunity_duration_us: float = 100):
        self.opportunity_duration = opportunity_duration_us
    
    def simulate_arbitrage(self, latency_us: float, initial_spread_bps: float = 10, 
                           num_opportunities: int = 1000) -> Dict:
        """Simulate arbitrage capture with given latency."""
        captured = 0
        missed = 0
        partial = 0
        total_profit_bps = 0
        
        for _ in range(num_opportunities):
            remaining_spread = initial_spread_bps * (1 - latency_us / self.opportunity_duration)
            
            if latency_us >= self.opportunity_duration:
                missed += 1
            elif remaining_spread >= initial_spread_bps * 0.5:
                captured += 1
                total_profit_bps += remaining_spread
            else:
                partial += 1
                total_profit_bps += max(0, remaining_spread)
        
        return {
            'latency_us': latency_us,
            'opportunities': num_opportunities,
            'captured': captured,
            'partial': partial,
            'missed': missed,
            'capture_rate': captured / num_opportunities,
            'total_profit_bps': total_profit_bps,
            'avg_profit_bps': total_profit_bps / num_opportunities
        }

# Compare different latency levels
simulator = LatencySimulator(opportunity_duration_us=100)
latencies = [10, 25, 50, 75, 100, 150, 200]
results = [simulator.simulate_arbitrage(lat, initial_spread_bps=10) for lat in latencies]
df_results = pd.DataFrame(results)

print("Impact of Latency on Arbitrage Capture")
print("=" * 60)
print(f"Opportunity duration: 100 μs, Initial spread: 10 bps\n")
print(df_results[['latency_us', 'capture_rate', 'avg_profit_bps']].to_string(index=False))
# Visualize latency impact
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].plot(df_results['latency_us'], df_results['capture_rate'] * 100, marker='o', linewidth=2)
axes[0].fill_between(df_results['latency_us'], 0, df_results['capture_rate'] * 100, alpha=0.3)
axes[0].set_xlabel('Latency (μs)')
axes[0].set_ylabel('Capture Rate (%)')
axes[0].set_title('Arbitrage Capture Rate vs Latency')
axes[0].grid(True, alpha=0.3)

axes[1].plot(df_results['latency_us'], df_results['avg_profit_bps'], marker='s', linewidth=2, color='green')
axes[1].fill_between(df_results['latency_us'], 0, df_results['avg_profit_bps'], alpha=0.3, color='green')
axes[1].set_xlabel('Latency (μs)')
axes[1].set_ylabel('Average Profit (bps)')
axes[1].set_title('Profit per Opportunity vs Latency')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nKey insight: In competitive HFT, even 10 μs can mean the difference")
print("between capturing an opportunity and missing it entirely.")

Exercise 16.1: Latency Budget Calculator (Guided)

Your Task: Calculate how to allocate a total latency budget across different system components.

Components that are harder to optimize should get larger allocations.

Fill in the blanks:

Exercise
Click to reveal solution
def calculate_latency_budget(total_budget_us: float, component_weights: Dict[str, float]) -> Dict:
    """
    Allocate latency budget across components.

    Higher weight = harder to optimize = larger allocation.
    """
    total_weight = sum(component_weights.values())

    allocations = {}
    for component, weight in component_weights.items():
        allocation = total_budget_us * (weight / total_weight)
        pct_of_total = (weight / total_weight) * 100

        allocations[component] = {
            'budget_us': allocation,
            'weight': weight,
            'pct_of_total': pct_of_total
        }

    return allocations

# Test
components = {
    'market_data_parsing': 2.0,
    'strategy_logic': 1.0,
    'risk_check': 0.5,
    'order_construction': 0.5,
    'network_io': 3.0,
}

budget = calculate_latency_budget(100, components)

print("Latency Budget Allocation (100 μs total)")
print("=" * 50)
for component, alloc in budget.items():
    print(f"{component:25} {alloc['budget_us']:6.1f} μs ({alloc['pct_of_total']:4.1f}%)")

Section 16.2: Co-Location Basics

Co-location means placing your trading servers physically close to the exchange's matching engine.

In this section, you will learn: - Why physical proximity matters - Network latency calculations - Co-location infrastructure costs and ROI

Why Co-Location Matters

Light travels at approximately: - 299,792 km/s in vacuum - ~200,000 km/s in fiber optic cable

This means: - 1 km of fiber = ~5 μs latency - NY to Chicago (~1,200 km) = ~6 ms minimum - NY to London (~5,500 km) = ~27 ms minimum

class NetworkLatencyCalculator:
    """Calculate network latency based on distance and medium."""
    
    SPEEDS = {
        'vacuum': 299792,
        'fiber': 200000,
        'microwave': 299792,
        'copper': 200000,
    }
    
    EXCHANGES = {
        'NYSE': {'location': 'Mahwah, NJ', 'lat': 41.08, 'lon': -74.14},
        'NASDAQ': {'location': 'Carteret, NJ', 'lat': 40.58, 'lon': -74.23},
        'CME': {'location': 'Aurora, IL', 'lat': 41.76, 'lon': -88.29},
        'LSE': {'location': 'Basildon, UK', 'lat': 51.57, 'lon': 0.49},
        'TSE': {'location': 'Tokyo, JP', 'lat': 35.68, 'lon': 139.75},
    }
    
    @classmethod
    def distance_km(cls, lat1: float, lon1: float, lat2: float, lon2: float) -> float:
        """Calculate distance using Haversine formula."""
        R = 6371
        lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
        dlat = lat2 - lat1
        dlon = lon2 - lon1
        a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
        c = 2 * np.arcsin(np.sqrt(a))
        return R * c
    
    @classmethod
    def latency_one_way(cls, distance_km: float, medium: str = 'fiber', 
                        overhead_factor: float = 1.2) -> Dict:
        """Calculate one-way latency."""
        effective_distance = distance_km * overhead_factor
        speed = cls.SPEEDS.get(medium, cls.SPEEDS['fiber'])
        latency_seconds = effective_distance / speed
        
        return {
            'distance_km': distance_km,
            'effective_distance_km': effective_distance,
            'medium': medium,
            'latency_ms': latency_seconds * 1000,
            'latency_us': latency_seconds * 1000000,
            'round_trip_ms': latency_seconds * 2000
        }
    
    @classmethod
    def exchange_to_exchange(cls, exchange1: str, exchange2: str, 
                              medium: str = 'fiber') -> Dict:
        """Calculate latency between two exchanges."""
        loc1 = cls.EXCHANGES.get(exchange1)
        loc2 = cls.EXCHANGES.get(exchange2)
        if not loc1 or not loc2:
            return None
        
        distance = cls.distance_km(loc1['lat'], loc1['lon'], loc2['lat'], loc2['lon'])
        result = cls.latency_one_way(distance, medium)
        result['from'] = f"{exchange1} ({loc1['location']})"
        result['to'] = f"{exchange2} ({loc2['location']})"
        return result
# Calculate latencies between major exchanges
calc = NetworkLatencyCalculator()

print("Network Latency Between Major Exchanges")
print("=" * 70)

routes = [
    ('NYSE', 'NASDAQ'),
    ('NYSE', 'CME'),
    ('NYSE', 'LSE'),
    ('NYSE', 'TSE'),
    ('CME', 'LSE'),
]

for ex1, ex2 in routes:
    result = calc.exchange_to_exchange(ex1, ex2)
    print(f"{ex1} <-> {ex2}:")
    print(f"  Distance: {result['distance_km']:,.0f} km")
    print(f"  One-way (fiber): {result['latency_ms']:.2f} ms")
    print(f"  Round-trip: {result['round_trip_ms']:.2f} ms")
    print()
# Co-location advantage visualization
distances = [0.01, 0.1, 1, 10, 100, 1000]
latencies = [calc.latency_one_way(d, 'fiber', overhead_factor=1.0)['latency_us'] for d in distances]

fig, ax = plt.subplots(figsize=(10, 5))
ax.semilogx(distances, latencies, marker='o', linewidth=2, markersize=10)

annotations = [
    (0.01, "Same rack (10m)"),
    (0.1, "Same data center (100m)"),
    (10, "Same city (10km)"),
    (1000, "Cross-country (1000km)"),
]

for dist, label in annotations:
    idx = distances.index(dist)
    ax.annotate(label, (dist, latencies[idx]), textcoords="offset points", xytext=(10, 10), fontsize=9)

ax.set_xlabel('Distance (km)')
ax.set_ylabel('One-way Latency (μs)')
ax.set_title('Network Latency vs Distance (Fiber Optic)')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Key distances:")
for d, l in zip(distances, latencies):
    print(f"  {d*1000:>8.0f} meters: {l:>10.1f} μs")

Exercise 16.2: Co-Location ROI Calculator (Guided)

Your Task: Calculate the return on investment for a co-location setup.

Compare latency advantage against competitors to estimate profit potential.

Fill in the blanks:

Exercise
Click to reveal solution
def calculate_colocation_roi(setup_cost: float, monthly_cost: float,
                              our_latency_us: float, competitor_latency_us: float,
                              profit_per_us_saved: float = 500) -> Dict:
    """
    Calculate ROI of co-location setup.
    """
    latency_advantage = competitor_latency_us - our_latency_us
    daily_profit = max(0, latency_advantage * profit_per_us_saved)
    monthly_profit = daily_profit * 21
    net_monthly = monthly_profit - monthly_cost

    if net_monthly > 0:
        payback_months = setup_cost / net_monthly
    else:
        payback_months = float('inf')

    return {
        'latency_advantage_us': latency_advantage,
        'daily_profit': daily_profit,
        'monthly_profit': monthly_profit,
        'monthly_cost': monthly_cost,
        'net_monthly': net_monthly,
        'setup_cost': setup_cost,
        'payback_months': payback_months
    }

# Compare setups
setups = [
    ('Basic (no FPGA)', 50000, 20000, 15),
    ('Premium (with FPGA)', 150000, 35000, 2),
]

print("Co-Location ROI Analysis")
print("=" * 50)
print(f"Competitor latency: 20 μs")
print(f"Profit per μs saved: $500/day\n")

for name, setup, monthly, latency in setups:
    roi = calculate_colocation_roi(setup, monthly, latency, 20, 500)
    print(f"{name}:")
    print(f"  Setup: ${setup:,}, Monthly: ${monthly:,}")
    print(f"  Our latency: {latency} μs, Advantage: {roi['latency_advantage_us']} μs")
    print(f"  Net monthly: ${roi['net_monthly']:,.0f}")
    print(f"  Payback: {roi['payback_months']:.1f} months")
    print()

Section 16.3: Common HFT Strategies

HFT strategies exploit speed advantages in various ways.

In this section, you will learn: - Market making mechanics - Statistical arbitrage concepts - Latency arbitrage basics

Strategy Categories

Strategy Description Key Risk
Market Making Provide liquidity, earn spread Inventory risk
Statistical Arbitrage Exploit price discrepancies Model risk
Latency Arbitrage Trade on info faster Speed competition
Event Arbitrage React to news quickly Information risk
class HFTMarketMaker:
    """Simplified HFT market making simulator."""
    
    def __init__(self, symbol: str, inventory_limit: int = 1000, 
                 base_spread_bps: float = 5, volatility: float = 0.02):
        self.symbol = symbol
        self.inventory_limit = inventory_limit
        self.base_spread_bps = base_spread_bps
        self.volatility = volatility
        
        self.inventory = 0
        self.cash = 0
        self.trades = []
        self.pnl_history = []
    
    def calculate_quotes(self, mid_price: float, market_volatility: float = None) -> Dict:
        """Calculate bid and ask quotes with inventory skew."""
        vol = market_volatility or self.volatility
        spread_pct = self.base_spread_bps / 10000
        spread_pct *= vol / 0.02
        
        inventory_ratio = self.inventory / self.inventory_limit
        skew = inventory_ratio * spread_pct * 0.5
        
        half_spread = spread_pct / 2
        bid = mid_price * (1 - half_spread - skew)
        ask = mid_price * (1 + half_spread - skew)
        
        return {'bid': bid, 'ask': ask, 'spread_pct': spread_pct, 'skew': skew, 'mid': mid_price}
    
    def process_fill(self, side: str, price: float, quantity: int):
        """Process a fill."""
        if side == 'buy':
            self.inventory += quantity
            self.cash -= price * quantity
        else:
            self.inventory -= quantity
            self.cash += price * quantity
        self.trades.append({'side': side, 'price': price, 'quantity': quantity, 'inventory': self.inventory})
    
    def simulate_session(self, initial_price: float = 100, num_ticks: int = 1000, 
                         fill_probability: float = 0.1) -> pd.DataFrame:
        """Simulate a trading session."""
        np.random.seed(42)
        price = initial_price
        
        for tick in range(num_ticks):
            price *= (1 + np.random.normal(0, self.volatility/100))
            quotes = self.calculate_quotes(price)
            
            if np.random.random() < fill_probability:
                if np.random.random() < 0.5:
                    if self.inventory < self.inventory_limit:
                        qty = np.random.randint(10, 50)
                        self.process_fill('buy', quotes['bid'], qty)
                else:
                    if self.inventory > -self.inventory_limit:
                        qty = np.random.randint(10, 50)
                        self.process_fill('sell', quotes['ask'], qty)
            
            mtm_pnl = self.cash + self.inventory * price
            self.pnl_history.append({'tick': tick, 'price': price, 'inventory': self.inventory, 
                                     'cash': self.cash, 'mtm_pnl': mtm_pnl})
        
        return pd.DataFrame(self.pnl_history)
# Run market making simulation
mm = HFTMarketMaker('DEMO', inventory_limit=500, base_spread_bps=5)
results = mm.simulate_session(initial_price=100, num_ticks=1000)

print("Market Making Simulation Results")
print("=" * 50)
print(f"Total Trades: {len(mm.trades)}")
print(f"Final Inventory: {mm.inventory} shares")
print(f"Final Cash: ${mm.cash:,.2f}")
print(f"Final MTM PnL: ${results['mtm_pnl'].iloc[-1]:,.2f}")
# Visualize market making session
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)

axes[0].plot(results['tick'], results['price'], linewidth=1)
axes[0].set_ylabel('Price ($)')
axes[0].set_title('Price Evolution')
axes[0].grid(True, alpha=0.3)

axes[1].plot(results['tick'], results['inventory'], linewidth=1, color='orange')
axes[1].axhline(0, color='gray', linestyle='--')
axes[1].fill_between(results['tick'], 0, results['inventory'], alpha=0.3, color='orange')
axes[1].set_ylabel('Inventory')
axes[1].set_title('Inventory Position')
axes[1].grid(True, alpha=0.3)

axes[2].plot(results['tick'], results['mtm_pnl'], linewidth=1, color='green')
axes[2].fill_between(results['tick'], 0, results['mtm_pnl'], alpha=0.3, color='green')
axes[2].set_ylabel('MTM PnL ($)')
axes[2].set_xlabel('Tick')
axes[2].set_title('Mark-to-Market PnL')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
class PairsArbitrage:
    """Simple pairs trading / stat arb strategy."""
    
    def __init__(self, entry_zscore: float = 2.0, exit_zscore: float = 0.5, lookback: int = 100):
        self.entry_zscore = entry_zscore
        self.exit_zscore = exit_zscore
        self.lookback = lookback
        self.spread_history = deque(maxlen=lookback)
        self.position = 0
    
    def update_and_signal(self, price_a: float, price_b: float, ratio: float = 1.0) -> Dict:
        """Update spread history and generate trading signal."""
        spread = price_a - ratio * price_b
        self.spread_history.append(spread)
        
        if len(self.spread_history) < self.lookback:
            return {'signal': 'wait', 'zscore': None}
        
        spread_array = np.array(self.spread_history)
        mean = np.mean(spread_array)
        std = np.std(spread_array)
        zscore = (spread - mean) / std if std != 0 else 0
        
        signal = 'hold'
        
        if self.position == 0:
            if zscore > self.entry_zscore:
                signal = 'short_spread'
                self.position = -1
            elif zscore < -self.entry_zscore:
                signal = 'long_spread'
                self.position = 1
        else:
            if self.position == 1 and zscore >= -self.exit_zscore:
                signal = 'close_long'
                self.position = 0
            elif self.position == -1 and zscore <= self.exit_zscore:
                signal = 'close_short'
                self.position = 0
        
        return {'signal': signal, 'zscore': zscore, 'spread': spread, 'position': self.position}
# Simulate pairs trading
np.random.seed(123)

n_points = 500
common_factor = np.cumsum(np.random.randn(n_points) * 0.5)
noise_a = np.cumsum(np.random.randn(n_points) * 0.2)
noise_b = np.cumsum(np.random.randn(n_points) * 0.2)

price_a = 100 + common_factor + noise_a
price_b = 100 + common_factor + noise_b

pairs = PairsArbitrage(entry_zscore=2.0, exit_zscore=0.5)
signals = [pairs.update_and_signal(pa, pb) for pa, pb in zip(price_a, price_b)]
df_signals = pd.DataFrame(signals)

signal_counts = df_signals['signal'].value_counts()
print("Pairs Trading Simulation")
print("=" * 40)
print("Signal Counts:")
for sig, count in signal_counts.items():
    print(f"  {sig}: {count}")

Exercise 16.3: Market Maker Spread Calculator (Guided)

Your Task: Calculate optimal bid-ask spreads based on inventory and volatility.

The spread should widen when: - Inventory is high (to reduce position) - Volatility is high (to compensate for risk)

Fill in the blanks:

Exercise
Click to reveal solution
def calculate_optimal_spread(mid_price: float, base_spread_bps: float,
                              inventory: int, inventory_limit: int,
                              volatility: float, base_volatility: float = 0.02) -> Dict:
    """
    Calculate optimal bid-ask spread.
    """
    spread_pct = base_spread_bps / 10000
    vol_multiplier = volatility / base_volatility
    adjusted_spread = spread_pct * vol_multiplier
    inventory_ratio = inventory / inventory_limit
    skew = inventory_ratio * adjusted_spread * 0.5

    half_spread = adjusted_spread / 2
    bid = mid_price * (1 - half_spread - skew)
    ask = mid_price * (1 + half_spread - skew)

    return {
        'bid': bid,
        'ask': ask,
        'spread_bps': adjusted_spread * 10000,
        'skew_bps': skew * 10000,
        'vol_multiplier': vol_multiplier
    }

# Test with different scenarios
print("Market Maker Spread Analysis")
print("=" * 60)

scenarios = [
    ("Neutral inventory, low vol", 0, 0.01),
    ("Long inventory, low vol", 500, 0.01),
    ("Short inventory, low vol", -500, 0.01),
    ("Neutral inventory, high vol", 0, 0.04),
    ("Long inventory, high vol", 500, 0.04),
]

for desc, inv, vol in scenarios:
    result = calculate_optimal_spread(100, 5, inv, 1000, vol, 0.02)
    print(f"{desc}:")
    print(f"  Bid: ${result['bid']:.4f}, Ask: ${result['ask']:.4f}")
    print(f"  Spread: {result['spread_bps']:.2f} bps, Skew: {result['skew_bps']:.2f} bps")
    print()

Section 16.4: Regulatory Considerations

HFT operates in a heavily regulated environment.

In this section, you will learn: - Key regulations (Reg NMS, MiFID II) - Prohibited practices (spoofing, layering) - Pre-trade risk controls

Prohibited Practices

  • Spoofing: Placing orders you intend to cancel
  • Layering: Creating false impression of supply/demand
  • Quote Stuffing: Flooding exchanges with orders to create latency
class ComplianceChecker:
    """Basic compliance checking for trading activities."""
    
    def __init__(self, config: Dict = None):
        self.config = config or {
            'max_order_rate_per_second': 100,
            'max_cancel_ratio': 0.9,
            'max_position_size': 10000,
            'max_daily_orders': 50000,
        }
        self.order_timestamps = deque(maxlen=1000)
        self.orders_sent = 0
        self.orders_canceled = 0
        self.orders_filled = 0
        self.violations = []
    
    def check_order_rate(self) -> Dict:
        """Check if order rate exceeds limit."""
        now = time.time()
        while self.order_timestamps and (now - self.order_timestamps[0]) > 1:
            self.order_timestamps.popleft()
        
        current_rate = len(self.order_timestamps)
        max_rate = self.config['max_order_rate_per_second']
        
        if current_rate >= max_rate:
            return {'passed': False, 'reason': f'Order rate {current_rate}/s exceeds limit {max_rate}/s'}
        return {'passed': True}
    
    def check_cancel_ratio(self) -> Dict:
        """Check if cancellation ratio is suspicious."""
        if self.orders_sent < 100:
            return {'passed': True}
        
        cancel_ratio = self.orders_canceled / self.orders_sent
        max_ratio = self.config['max_cancel_ratio']
        
        if cancel_ratio > max_ratio:
            return {'passed': False, 'reason': f'Cancel ratio {cancel_ratio:.1%} exceeds limit {max_ratio:.1%}'}
        return {'passed': True}
    
    def check_position_limit(self, current_position: int, order_qty: int) -> Dict:
        """Check if order would exceed position limits."""
        max_pos = self.config['max_position_size']
        resulting_position = current_position + order_qty
        
        if abs(resulting_position) > max_pos:
            return {'passed': False, 'reason': f'Position {resulting_position} exceeds limit {max_pos}'}
        return {'passed': True}
    
    def pre_order_check(self, order: Dict) -> Dict:
        """Run all pre-order compliance checks."""
        checks = [
            ('order_rate', self.check_order_rate()),
            ('cancel_ratio', self.check_cancel_ratio()),
            ('position_limit', self.check_position_limit(
                order.get('current_position', 0), order.get('quantity', 0)))
        ]
        
        all_passed = all(c[1]['passed'] for c in checks)
        failed_checks = [(name, result) for name, result in checks if not result['passed']]
        
        if not all_passed:
            self.violations.append({'timestamp': datetime.now(), 'order': order, 'failed_checks': failed_checks})
        
        return {'approved': all_passed, 'checks': dict(checks), 'failed': failed_checks}
    
    def record_order(self):
        self.orders_sent += 1
        self.order_timestamps.append(time.time())
    
    def record_cancel(self):
        self.orders_canceled += 1
    
    def record_fill(self):
        self.orders_filled += 1
    
    def summary(self) -> Dict:
        return {
            'orders_sent': self.orders_sent,
            'orders_canceled': self.orders_canceled,
            'orders_filled': self.orders_filled,
            'cancel_ratio': self.orders_canceled / max(1, self.orders_sent),
            'fill_ratio': self.orders_filled / max(1, self.orders_sent),
            'violations': len(self.violations)
        }
# Demo compliance checker
checker = ComplianceChecker()

for i in range(200):
    order = {
        'symbol': 'TEST',
        'side': 'buy',
        'quantity': 100,
        'current_position': i * 50 if i < 50 else 2500
    }
    
    result = checker.pre_order_check(order)
    
    if result['approved']:
        checker.record_order()
        if np.random.random() < 0.85:
            checker.record_cancel()
        else:
            checker.record_fill()

print("Compliance Summary")
print("=" * 40)
summary = checker.summary()
for key, value in summary.items():
    if isinstance(value, float):
        print(f"{key}: {value:.1%}")
    else:
        print(f"{key}: {value}")

if checker.violations:
    print(f"\nWarning: {len(checker.violations)} compliance violations detected!")

Exercise 16.4: Spoofing Detector (Open-ended)

Your Task:

Build a class that detects potential spoofing behavior by analyzing order patterns.

The detector should: - Track orders and their outcomes (filled vs canceled) - Flag suspicious patterns (high cancel rate, short order lifetime) - Calculate a spoofing risk score

Your implementation:

Exercise
Click to reveal solution
class SpoofingDetector:
    """Detect potential spoofing behavior."""

    def __init__(self, cancel_threshold: float = 0.9, min_lifetime_ms: float = 100):
        self.cancel_threshold = cancel_threshold
        self.min_lifetime_ms = min_lifetime_ms

        self.orders = {}
        self.canceled_count = 0
        self.filled_count = 0
        self.short_lived_count = 0

    def record_order(self, order_id: str, timestamp: datetime, 
                     side: str, price: float, quantity: int):
        """Record a new order."""
        self.orders[order_id] = {
            'timestamp': timestamp,
            'side': side,
            'price': price,
            'quantity': quantity,
            'status': 'active'
        }

    def record_cancel(self, order_id: str, timestamp: datetime):
        """Record order cancellation."""
        if order_id not in self.orders:
            return

        order = self.orders[order_id]
        lifetime_ms = (timestamp - order['timestamp']).total_seconds() * 1000

        order['status'] = 'canceled'
        order['lifetime_ms'] = lifetime_ms

        self.canceled_count += 1
        if lifetime_ms < self.min_lifetime_ms:
            self.short_lived_count += 1

    def record_fill(self, order_id: str, timestamp: datetime):
        """Record order fill."""
        if order_id not in self.orders:
            return

        order = self.orders[order_id]
        lifetime_ms = (timestamp - order['timestamp']).total_seconds() * 1000

        order['status'] = 'filled'
        order['lifetime_ms'] = lifetime_ms
        self.filled_count += 1

    def calculate_risk_score(self) -> Dict:
        """Calculate spoofing risk score."""
        total_orders = self.canceled_count + self.filled_count

        if total_orders == 0:
            return {'score': 0, 'flags': []}

        cancel_ratio = self.canceled_count / total_orders
        short_lived_ratio = self.short_lived_count / total_orders

        flags = []
        score = 0

        if cancel_ratio > self.cancel_threshold:
            flags.append(f'High cancel ratio: {cancel_ratio:.1%}')
            score += 50

        if short_lived_ratio > 0.5:
            flags.append(f'Many short-lived orders: {short_lived_ratio:.1%}')
            score += 30

        if cancel_ratio > 0.95 and short_lived_ratio > 0.7:
            flags.append('CRITICAL: Pattern consistent with spoofing')
            score += 20

        return {
            'score': min(100, score),
            'cancel_ratio': cancel_ratio,
            'short_lived_ratio': short_lived_ratio,
            'flags': flags,
            'total_orders': total_orders
        }

# Test the detector
detector = SpoofingDetector(cancel_threshold=0.9, min_lifetime_ms=100)
base_time = datetime.now()

for i in range(100):
    order_time = base_time + timedelta(milliseconds=i*10)
    detector.record_order(f'order_{i}', order_time, 'buy', 100.0, 100)

    if np.random.random() < 0.95:
        cancel_time = order_time + timedelta(milliseconds=np.random.uniform(10, 50))
        detector.record_cancel(f'order_{i}', cancel_time)
    else:
        fill_time = order_time + timedelta(milliseconds=np.random.uniform(100, 500))
        detector.record_fill(f'order_{i}', fill_time)

result = detector.calculate_risk_score()
print("Spoofing Detection Results")
print("=" * 50)
print(f"Risk Score: {result['score']}/100")
print(f"Cancel Ratio: {result['cancel_ratio']:.1%}")
for flag in result['flags']:
    print(f"  - {flag}")

Exercise 16.5: Latency Anomaly Detector (Open-ended)

Your Task:

Build a class that detects latency anomalies (spikes) in a trading system.

The detector should: - Track latency measurements over time - Identify anomalies using percentile thresholds - Provide statistics and anomaly counts

Your implementation:

Exercise
Click to reveal solution
class LatencyAnomalyDetector:
    """Detect latency anomalies in trading systems."""

    def __init__(self, window_size: int = 1000, anomaly_percentile: float = 99):
        self.window_size = window_size
        self.anomaly_percentile = anomaly_percentile

        self.latencies = deque(maxlen=window_size)
        self.timestamps = deque(maxlen=window_size)
        self.anomaly_count = 0
        self.total_count = 0

    def record_latency(self, timestamp: datetime, latency_us: float):
        """Record a latency measurement."""
        self.latencies.append(latency_us)
        self.timestamps.append(timestamp)
        self.total_count += 1

        if self.is_anomaly(latency_us):
            self.anomaly_count += 1

    def is_anomaly(self, latency_us: float) -> bool:
        """Check if latency is an anomaly."""
        if len(self.latencies) < 100:
            return False

        threshold = np.percentile(list(self.latencies), self.anomaly_percentile)
        return latency_us > threshold

    def get_statistics(self) -> Dict:
        """Get latency statistics."""
        if not self.latencies:
            return {}

        arr = np.array(self.latencies)

        return {
            'count': len(arr),
            'mean_us': np.mean(arr),
            'median_us': np.median(arr),
            'std_us': np.std(arr),
            'min_us': np.min(arr),
            'max_us': np.max(arr),
            'p95_us': np.percentile(arr, 95),
            'p99_us': np.percentile(arr, 99),
            'anomaly_count': self.anomaly_count,
            'anomaly_rate': self.anomaly_count / max(1, self.total_count)
        }

# Test
detector = LatencyAnomalyDetector(window_size=1000, anomaly_percentile=99)
np.random.seed(42)
base_time = datetime.now()

for i in range(2000):
    timestamp = base_time + timedelta(microseconds=i*100)
    base_latency = 10 * np.random.lognormal(0, 0.3)

    if np.random.random() < 0.02:
        latency = base_latency * np.random.uniform(5, 20)
    else:
        latency = base_latency

    detector.record_latency(timestamp, latency)

stats = detector.get_statistics()
print("Latency Anomaly Detection")
print("=" * 50)
print(f"Mean: {stats['mean_us']:.2f} μs")
print(f"P99: {stats['p99_us']:.2f} μs")
print(f"Anomalies: {stats['anomaly_count']} ({stats['anomaly_rate']:.1%})")

Exercise 16.6: Complete HFT System Analyzer (Open-ended)

Your Task:

Build a comprehensive HFT system analyzer that combines latency profiling, compliance checking, and performance metrics.

The analyzer should: - Profile latency across multiple components - Run compliance checks on simulated orders - Detect latency anomalies - Generate a comprehensive report

Your implementation:

Exercise
Click to reveal solution
class HFTSystemAnalyzer:
    """Comprehensive HFT system analysis tool."""

    def __init__(self, name: str = "HFT System"):
        self.name = name
        self.profiler = LatencyProfiler()
        self.compliance = ComplianceChecker()

        self.components = [
            'market_data_recv', 'data_parsing', 'strategy_compute',
            'risk_check', 'order_send', 'order_ack'
        ]

        for comp in self.components:
            self.profiler.add_component(comp)

        self.tick_data = []

    def simulate_tick(self, base_latencies: Dict = None) -> Dict:
        """Simulate a single tick."""
        base = base_latencies or {
            'market_data_recv': 5, 'data_parsing': 2, 'strategy_compute': 10,
            'risk_check': 1, 'order_send': 3, 'order_ack': 5
        }

        tick_latencies = {}
        total_latency = 0

        for comp in self.components:
            base_us = base.get(comp, 5)
            actual_us = base_us * np.random.lognormal(0, 0.3)
            actual_ns = int(actual_us * 1000)

            self.profiler.get_component(comp).record(actual_ns)
            tick_latencies[comp] = actual_us
            total_latency += actual_us

        tick_latencies['total'] = total_latency
        self.tick_data.append(tick_latencies)
        return tick_latencies

    def run_simulation(self, num_ticks: int = 10000) -> pd.DataFrame:
        """Run full simulation."""
        for _ in range(num_ticks):
            self.simulate_tick()
        return pd.DataFrame(self.tick_data)

    def latency_breakdown(self) -> pd.DataFrame:
        """Get latency breakdown."""
        breakdown = []
        total_mean = 0

        for comp in self.components:
            stats = self.profiler.get_component(comp).statistics()
            if stats:
                breakdown.append({
                    'component': comp,
                    'mean_us': stats['mean_us'],
                    'p99_us': stats['p99_ns'] / 1000
                })
                total_mean += stats['mean_us']

        for item in breakdown:
            item['pct_of_total'] = item['mean_us'] / total_mean * 100

        return pd.DataFrame(breakdown)

    def generate_report(self) -> str:
        """Generate comprehensive report."""
        report = [
            "=" * 70,
            f"HFT SYSTEM ANALYSIS REPORT: {self.name}",
            "=" * 70,
            "",
            "COMPONENT SUMMARY",
            "-" * 50,
            self.profiler.summary().to_string(index=False),
            "",
            "LATENCY BREAKDOWN",
            "-" * 50
        ]

        breakdown = self.latency_breakdown()
        total_mean = breakdown['mean_us'].sum()
        total_p99 = breakdown['p99_us'].sum()

        report.append(breakdown.to_string(index=False))
        report.extend([
            "",
            f"Total Mean Latency: {total_mean:.2f} μs",
            f"Total P99 Latency: {total_p99:.2f} μs",
            "",
            "RECOMMENDATIONS",
            "-" * 50
        ])

        bottleneck = breakdown.loc[breakdown['mean_us'].idxmax()]
        report.append(f"1. Primary bottleneck: {bottleneck['component']} ({bottleneck['pct_of_total']:.1f}%)")

        if total_mean > 50:
            report.append("2. Consider FPGA acceleration")
        if total_p99 > total_mean * 3:
            report.append("3. High jitter - investigate sources")

        report.extend(["", "=" * 70])
        return "\n".join(report)

# Run
analyzer = HFTSystemAnalyzer("Production Trading System")
df = analyzer.run_simulation(10000)
print(analyzer.generate_report())

Module Project: Complete HFT Analysis Suite

Build a production-ready HFT analysis system combining all concepts.

# YOUR CODE HERE - Module Project
# Build a complete HFT analysis suite that:
# 1. Profiles latency across all system components
# 2. Simulates market making with inventory management
# 3. Runs compliance checks on all orders
# 4. Detects latency anomalies and spoofing patterns
# 5. Generates a comprehensive report with recommendations

Key Takeaways

What You Learned

1. Latency and Speed

  • In HFT, microseconds matter - 10μs can mean profit or loss
  • Measure latency at each component to identify bottlenecks
  • Focus on tail latencies (P99) not just mean

2. Co-Location

  • Physical proximity to exchanges dramatically reduces latency
  • Speed of light limits minimum latency based on distance
  • Co-location is expensive but essential for competitive HFT

3. HFT Strategies

  • Market making: Provide liquidity, earn the spread
  • Statistical arbitrage: Exploit price discrepancies
  • Latency arbitrage: React faster than others

4. Regulatory Compliance

  • HFT is heavily regulated
  • Prohibited: spoofing, layering, quote stuffing
  • Pre-trade risk controls are required

Reality Check

  • Building HFT systems requires millions in capital
  • Competition has compressed returns
  • Most retail traders should focus on longer-term strategies

Coming Up Next

In Module 17: Cloud Deployment, we'll learn how to deploy trading systems in the cloud - a more accessible approach for most traders.


Congratulations on completing Module 16!

Module 17: Cloud Deployment

Part 5: Production & Infrastructure

Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)

Learning Objectives

By the end of this module, you will be able to: - Design cloud architectures for trading systems on AWS/GCP/Azure - Containerize trading applications with Docker - Build serverless functions for event-driven workflows - Implement CI/CD pipelines for automated deployment


# Environment setup
import os
import json
import yaml
import hashlib
from datetime import datetime
from dataclasses import dataclass, field
from typing import List, Dict, Optional
import warnings
warnings.filterwarnings('ignore')

print("Module 17: Cloud Deployment")
print("=" * 40)
print()
print("Note: This module covers cloud concepts and configuration.")
print("Actual deployment requires cloud provider accounts.")

Section 17.1: Cloud Architecture

Major Cloud Providers

Provider Strengths Best For
AWS Most services, market leader General purpose, enterprise
GCP Data analytics, ML, networking Data-heavy applications
Azure Microsoft integration, enterprise Corporate environments

Key Services for Trading Systems

Service Type AWS GCP Azure
Compute EC2 Compute Engine Virtual Machines
Serverless Lambda Cloud Functions Functions
Containers ECS/EKS GKE AKS
Database RDS/DynamoDB Cloud SQL/Firestore SQL Database
Message Queue SQS/SNS Pub/Sub Service Bus
Storage S3 Cloud Storage Blob Storage
@dataclass
class CloudService:
    """Represents a cloud service configuration."""
    name: str
    provider: str  # aws, gcp, azure
    service_type: str
    tier: str
    region: str
    config: Dict = field(default_factory=dict)
    
    def estimated_monthly_cost(self) -> float:
        """Estimate monthly cost based on tier."""
        # Simplified cost estimates
        base_costs = {
            'compute': {'small': 50, 'medium': 150, 'large': 400},
            'serverless': {'small': 10, 'medium': 50, 'large': 200},
            'database': {'small': 30, 'medium': 100, 'large': 500},
            'storage': {'small': 5, 'medium': 25, 'large': 100},
            'queue': {'small': 1, 'medium': 10, 'large': 50},
        }
        
        return base_costs.get(self.service_type, {}).get(self.tier, 0)


class CloudArchitecture:
    """
    Design and plan cloud architecture for trading systems.
    """
    
    def __init__(self, name: str, provider: str = 'aws'):
        self.name = name
        self.provider = provider
        self.services: List[CloudService] = []
        self.connections: List[tuple] = []  # (from_service, to_service)
    
    def add_service(self, name: str, service_type: str, 
                    tier: str = 'medium', region: str = 'us-east-1',
                    config: Dict = None) -> CloudService:
        """Add a service to the architecture."""
        service = CloudService(
            name=name,
            provider=self.provider,
            service_type=service_type,
            tier=tier,
            region=region,
            config=config or {}
        )
        self.services.append(service)
        return service
    
    def connect(self, from_service: str, to_service: str):
        """Define a connection between services."""
        self.connections.append((from_service, to_service))
    
    def total_estimated_cost(self) -> float:
        """Calculate total estimated monthly cost."""
        return sum(s.estimated_monthly_cost() for s in self.services)
    
    def generate_terraform(self) -> str:
        """Generate basic Terraform configuration."""
        tf_config = []
        tf_config.append(f'# Terraform configuration for {self.name}')
        tf_config.append(f'# Provider: {self.provider}')
        tf_config.append('')
        
        # Provider block
        if self.provider == 'aws':
            tf_config.append('provider "aws" {')
            tf_config.append('  region = "us-east-1"')
            tf_config.append('}')
        elif self.provider == 'gcp':
            tf_config.append('provider "google" {')
            tf_config.append('  project = "your-project-id"')
            tf_config.append('  region  = "us-central1"')
            tf_config.append('}')
        
        tf_config.append('')
        
        # Resource blocks
        for service in self.services:
            tf_config.append(f'# {service.name} ({service.service_type})')
            resource_name = service.name.lower().replace(' ', '_').replace('-', '_')
            
            if service.service_type == 'compute':
                if self.provider == 'aws':
                    tf_config.append(f'resource "aws_instance" "{resource_name}" {{')
                    tf_config.append(f'  ami           = "ami-0c55b159cbfafe1f0"')
                    tf_config.append(f'  instance_type = "{self._get_instance_type(service.tier)}"')
                    tf_config.append(f'  tags = {{')
                    tf_config.append(f'    Name = "{service.name}"')
                    tf_config.append(f'  }}')
                    tf_config.append('}')
            
            elif service.service_type == 'serverless':
                if self.provider == 'aws':
                    tf_config.append(f'resource "aws_lambda_function" "{resource_name}" {{')
                    tf_config.append(f'  function_name = "{resource_name}"')
                    tf_config.append(f'  runtime       = "python3.9"')
                    tf_config.append(f'  handler       = "handler.main"')
                    tf_config.append(f'  memory_size   = {self._get_lambda_memory(service.tier)}')
                    tf_config.append('}')
            
            elif service.service_type == 'database':
                if self.provider == 'aws':
                    tf_config.append(f'resource "aws_db_instance" "{resource_name}" {{')
                    tf_config.append(f'  identifier        = "{resource_name}"')
                    tf_config.append(f'  engine            = "postgres"')
                    tf_config.append(f'  instance_class    = "{self._get_db_instance(service.tier)}"')
                    tf_config.append(f'  allocated_storage = 20')
                    tf_config.append('}')
            
            tf_config.append('')
        
        return '\n'.join(tf_config)
    
    def _get_instance_type(self, tier):
        """Get AWS instance type for tier."""
        mapping = {'small': 't3.micro', 'medium': 't3.medium', 'large': 't3.xlarge'}
        return mapping.get(tier, 't3.medium')
    
    def _get_lambda_memory(self, tier):
        """Get Lambda memory for tier."""
        mapping = {'small': 128, 'medium': 512, 'large': 2048}
        return mapping.get(tier, 512)
    
    def _get_db_instance(self, tier):
        """Get RDS instance class for tier."""
        mapping = {'small': 'db.t3.micro', 'medium': 'db.t3.medium', 'large': 'db.r5.large'}
        return mapping.get(tier, 'db.t3.medium')
    
    def display_architecture(self):
        """Display architecture summary."""
        print(f"Cloud Architecture: {self.name}")
        print(f"Provider: {self.provider.upper()}")
        print("=" * 50)
        print()
        print("Services:")
        for s in self.services:
            cost = s.estimated_monthly_cost()
            print(f"  [{s.service_type.upper()}] {s.name}")
            print(f"      Tier: {s.tier}, Region: {s.region}")
            print(f"      Est. Cost: ${cost}/month")
        print()
        if self.connections:
            print("Connections:")
            for from_s, to_s in self.connections:
                print(f"  {from_s} -> {to_s}")
        print()
        print(f"Total Estimated Cost: ${self.total_estimated_cost()}/month")

# Design a trading system architecture
arch = CloudArchitecture("Quantitative Trading Platform", provider='aws')

# Add services
arch.add_service("Market Data Processor", "serverless", tier="medium")
arch.add_service("Strategy Engine", "compute", tier="medium")
arch.add_service("Order Manager", "serverless", tier="small")
arch.add_service("Trade Database", "database", tier="medium")
arch.add_service("Market Data Storage", "storage", tier="medium")
arch.add_service("Event Queue", "queue", tier="medium")

# Define connections
arch.connect("Market Data Processor", "Event Queue")
arch.connect("Event Queue", "Strategy Engine")
arch.connect("Strategy Engine", "Order Manager")
arch.connect("Strategy Engine", "Trade Database")
arch.connect("Market Data Processor", "Market Data Storage")

arch.display_architecture()
# Generate Terraform configuration
print("Generated Terraform Configuration:")
print("=" * 50)
print(arch.generate_terraform())

Exercise 17.1: Service Cost Calculator (Guided)

Create a function that calculates optimal service tiers based on budget constraints.

Exercise
Click for solution
def calculate_optimal_tiers(services: Dict[str, str], budget: float) -> Dict:
    """
    Calculate optimal service tiers within budget.

    Args:
        services: Dict of {service_name: service_type}
        budget: Monthly budget in dollars

    Returns:
        Dict with allocations and analysis
    """
    costs = {
        'compute': {'small': 50, 'medium': 150, 'large': 400},
        'serverless': {'small': 10, 'medium': 50, 'large': 200},
        'database': {'small': 30, 'medium': 100, 'large': 500},
        'storage': {'small': 5, 'medium': 25, 'large': 100},
        'queue': {'small': 1, 'medium': 10, 'large': 50},
    }

    tiers = ['small', 'medium', 'large']
    allocations = {}
    total_cost = 0

    num_services = len(services)
    budget_per_service = budget / num_services

    for name, svc_type in services.items():
        type_costs = costs.get(svc_type, costs['compute'])

        selected_tier = 'small'
        for tier in tiers:
            tier_cost = type_costs[tier]
            if tier_cost <= budget_per_service:
                selected_tier = tier

        allocations[name] = {
            'type': svc_type,
            'tier': selected_tier,
            'cost': type_costs[selected_tier]
        }
        total_cost += type_costs[selected_tier]

    remaining = budget - total_cost

    return {
        'allocations': allocations,
        'total_cost': total_cost,
        'budget': budget,
        'remaining': remaining,
        'utilization': (total_cost / budget) * 100
    }

Section 17.2: Containerization

Containers package your application with all its dependencies, ensuring it runs the same everywhere.

Why Docker?

  • Consistency: "Works on my machine" becomes "works everywhere"
  • Isolation: Each service runs in its own environment
  • Portability: Move between cloud providers easily
  • Scaling: Spin up multiple instances quickly
class DockerfileGenerator:
    """
    Generate Dockerfiles for trading applications.
    """
    
    TEMPLATES = {
        'python-trading': '''
# Python Trading Application
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \\
    gcc \\
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first (for caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Expose port
EXPOSE {port}

# Run the application
CMD ["python", "{entrypoint}"]
''',
        
        'python-api': '''
# Python API Service
FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Environment
ENV PYTHONUNBUFFERED=1

EXPOSE {port}

# Use gunicorn for production
CMD ["gunicorn", "--bind", "0.0.0.0:{port}", "--workers", "4", "{entrypoint}:app"]
''',
        
        'python-jupyter': '''
# Jupyter Notebook for Research
FROM python:3.10

WORKDIR /notebooks

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install jupyter

# Copy notebooks
COPY . .

EXPOSE 8888

CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]
'''
    }
    
    @classmethod
    def generate(cls, template_name: str, port: int = 8000, 
                 entrypoint: str = 'main') -> str:
        """Generate Dockerfile from template."""
        template = cls.TEMPLATES.get(template_name, cls.TEMPLATES['python-trading'])
        return template.format(port=port, entrypoint=entrypoint)
    
    @classmethod
    def generate_requirements(cls, packages: List[str]) -> str:
        """Generate requirements.txt content."""
        return '\n'.join(packages)


class DockerComposeGenerator:
    """
    Generate Docker Compose configurations for multi-service applications.
    """
    
    def __init__(self, project_name: str):
        self.project_name = project_name
        self.services = {}
        self.networks = ['default']
        self.volumes = []
    
    def add_service(self, name: str, image: str = None, build: str = None,
                    ports: List[str] = None, environment: Dict = None,
                    depends_on: List[str] = None, volumes: List[str] = None,
                    command: str = None):
        """Add a service to the compose file."""
        service = {}
        
        if image:
            service['image'] = image
        if build:
            service['build'] = build
        if ports:
            service['ports'] = ports
        if environment:
            service['environment'] = environment
        if depends_on:
            service['depends_on'] = depends_on
        if volumes:
            service['volumes'] = volumes
        if command:
            service['command'] = command
        
        service['restart'] = 'unless-stopped'
        
        self.services[name] = service
    
    def add_volume(self, name: str):
        """Add a named volume."""
        self.volumes.append(name)
    
    def generate(self) -> str:
        """Generate docker-compose.yml content."""
        compose = {
            'version': '3.8',
            'services': self.services
        }
        
        if self.volumes:
            compose['volumes'] = {v: {} for v in self.volumes}
        
        return yaml.dump(compose, default_flow_style=False, sort_keys=False)

# Generate Dockerfile for a trading application
print("Dockerfile for Trading Application:")
print("=" * 50)
dockerfile = DockerfileGenerator.generate('python-trading', port=8080, entrypoint='trading_bot')
print(dockerfile)
# Generate Docker Compose for complete trading system
compose = DockerComposeGenerator("trading-system")

# Database
compose.add_service(
    name='postgres',
    image='postgres:15',
    ports=['5432:5432'],
    environment={
        'POSTGRES_DB': 'trading',
        'POSTGRES_USER': 'trader',
        'POSTGRES_PASSWORD': '${DB_PASSWORD}'
    },
    volumes=['postgres_data:/var/lib/postgresql/data']
)

# Redis for caching
compose.add_service(
    name='redis',
    image='redis:7-alpine',
    ports=['6379:6379']
)

# Market data collector
compose.add_service(
    name='data-collector',
    build='./data_collector',
    environment={
        'API_KEY': '${MARKET_API_KEY}',
        'REDIS_URL': 'redis://redis:6379'
    },
    depends_on=['redis']
)

# Strategy engine
compose.add_service(
    name='strategy-engine',
    build='./strategy',
    environment={
        'DATABASE_URL': 'postgresql://trader:${DB_PASSWORD}@postgres:5432/trading',
        'REDIS_URL': 'redis://redis:6379'
    },
    depends_on=['postgres', 'redis', 'data-collector']
)

# API server
compose.add_service(
    name='api',
    build='./api',
    ports=['8000:8000'],
    environment={
        'DATABASE_URL': 'postgresql://trader:${DB_PASSWORD}@postgres:5432/trading',
        'SECRET_KEY': '${API_SECRET_KEY}'
    },
    depends_on=['postgres']
)

# Dashboard
compose.add_service(
    name='dashboard',
    build='./dashboard',
    ports=['8050:8050'],
    environment={
        'API_URL': 'http://api:8000'
    },
    depends_on=['api']
)

compose.add_volume('postgres_data')

print("Docker Compose Configuration:")
print("=" * 50)
print(compose.generate())

Exercise 17.2: Docker Service Builder (Guided)

Build a function that generates Docker Compose service configurations with proper dependencies.

Exercise
Click for solution
def build_docker_services(components: List[Dict]) -> Dict:
    """
    Build Docker Compose services with dependency resolution.

    Args:
        components: List of component specs with name, type, dependencies

    Returns:
        Docker Compose services dict
    """
    default_ports = {
        'api': 8000,
        'database': 5432,
        'cache': 6379,
        'dashboard': 8050,
        'worker': None
    }

    default_images = {
        'database': 'postgres:15',
        'cache': 'redis:7-alpine'
    }

    services = {}

    for component in components:
        name = component['name']
        svc_type = component['type']
        deps = component.get('dependencies', [])

        service = {'restart': 'unless-stopped'}

        if svc_type in default_images:
            service['image'] = default_images[svc_type]
        else:
            service['build'] = f'./{name}'

        port = default_ports.get(svc_type)
        if port:
            service['ports'] = [f'{port}:{port}']

        if deps:
            service['depends_on'] = deps

        services[name] = service

    return {
        'version': '3.8',
        'services': services
    }

Section 17.3: Serverless Functions

Serverless computing lets you run code without managing servers. You pay only for execution time.

Use Cases for Trading

Use Case Function Type Trigger
Market data processing Data pipeline Schedule/Event
Alert notifications Notification Event
Report generation Batch Schedule
API webhooks API handler HTTP
Data backup Maintenance Schedule
class LambdaFunctionGenerator:
    """
    Generate AWS Lambda function templates.
    """
    
    @staticmethod
    def market_data_fetcher() -> str:
        """Generate market data fetcher Lambda."""
        return '''
import json
import boto3
import yfinance as yf
from datetime import datetime

s3 = boto3.client('s3')
sns = boto3.client('sns')

def handler(event, context):
    """
    Fetch market data and store in S3.
    Triggered by CloudWatch Events (schedule).
    """
    # Configuration
    symbols = event.get('symbols', ['SPY', 'QQQ', 'IWM'])
    bucket = event.get('bucket', 'my-market-data-bucket')
    
    results = []
    
    for symbol in symbols:
        try:
            # Fetch data
            ticker = yf.Ticker(symbol)
            data = ticker.history(period='1d')
            
            if not data.empty:
                # Store in S3
                date_str = datetime.now().strftime('%Y-%m-%d')
                key = f"daily/{symbol}/{date_str}.json"
                
                s3.put_object(
                    Bucket=bucket,
                    Key=key,
                    Body=data.to_json(),
                    ContentType='application/json'
                )
                
                results.append({
                    'symbol': symbol,
                    'status': 'success',
                    'key': key
                })
            else:
                results.append({
                    'symbol': symbol,
                    'status': 'no_data'
                })
                
        except Exception as e:
            results.append({
                'symbol': symbol,
                'status': 'error',
                'error': str(e)
            })
    
    return {
        'statusCode': 200,
        'body': json.dumps({
            'timestamp': datetime.now().isoformat(),
            'results': results
        })
    }
'''
    
    @staticmethod
    def price_alert_checker() -> str:
        """Generate price alert checker Lambda."""
        return '''
import json
import boto3
import yfinance as yf
from datetime import datetime

dynamodb = boto3.resource('dynamodb')
sns = boto3.client('sns')

def handler(event, context):
    """
    Check price alerts and send notifications.
    Triggered by CloudWatch Events (every 5 minutes during market hours).
    """
    # Get active alerts from DynamoDB
    table = dynamodb.Table('price_alerts')
    alerts = table.scan(
        FilterExpression='active = :active',
        ExpressionAttributeValues={':active': True}
    )['Items']
    
    triggered_alerts = []
    
    for alert in alerts:
        symbol = alert['symbol']
        target_price = float(alert['target_price'])
        condition = alert['condition']  # 'above' or 'below'
        topic_arn = alert['topic_arn']
        
        # Get current price
        ticker = yf.Ticker(symbol)
        current_price = ticker.fast_info['lastPrice']
        
        # Check condition
        triggered = False
        if condition == 'above' and current_price >= target_price:
            triggered = True
        elif condition == 'below' and current_price <= target_price:
            triggered = True
        
        if triggered:
            # Send notification
            message = f"""PRICE ALERT TRIGGERED
            
Symbol: {symbol}
Condition: Price {condition} ${target_price}
Current Price: ${current_price:.2f}
Time: {datetime.now().isoformat()}
"""
            
            sns.publish(
                TopicArn=topic_arn,
                Message=message,
                Subject=f'Price Alert: {symbol}'
            )
            
            # Deactivate alert
            table.update_item(
                Key={'alert_id': alert['alert_id']},
                UpdateExpression='SET active = :false',
                ExpressionAttributeValues={':false': False}
            )
            
            triggered_alerts.append(alert['alert_id'])
    
    return {
        'statusCode': 200,
        'body': json.dumps({
            'checked': len(alerts),
            'triggered': triggered_alerts
        })
    }
'''
    
    @staticmethod
    def report_generator() -> str:
        """Generate report generator Lambda."""
        return '''
import json
import boto3
import pandas as pd
from datetime import datetime, timedelta
from io import BytesIO

s3 = boto3.client('s3')
ses = boto3.client('ses')

def handler(event, context):
    """
    Generate daily performance report.
    Triggered by CloudWatch Events (end of day).
    """
    bucket = event.get('bucket', 'my-trading-bucket')
    recipient = event.get('email', 'trader@example.com')
    
    # Load trades from S3
    date_str = datetime.now().strftime('%Y-%m-%d')
    trades_key = f"trades/{date_str}.json"
    
    try:
        response = s3.get_object(Bucket=bucket, Key=trades_key)
        trades_df = pd.read_json(response['Body'])
    except:
        trades_df = pd.DataFrame()
    
    # Generate report
    if not trades_df.empty:
        total_pnl = trades_df['pnl'].sum()
        num_trades = len(trades_df)
        win_rate = (trades_df['pnl'] > 0).mean()
        
        report = f"""
DAILY TRADING REPORT - {date_str}
================================

Summary:
- Total Trades: {num_trades}
- Total P&L: ${total_pnl:,.2f}
- Win Rate: {win_rate:.1%}

Top Performers:
{trades_df.nlargest(3, 'pnl')[['symbol', 'pnl']].to_string()}

Worst Performers:
{trades_df.nsmallest(3, 'pnl')[['symbol', 'pnl']].to_string()}
"""
    else:
        report = f"DAILY REPORT - {date_str}\\n\\nNo trades executed today."
    
    # Send email via SES
    ses.send_email(
        Source='reports@trading-system.com',
        Destination={'ToAddresses': [recipient]},
        Message={
            'Subject': {'Data': f'Daily Trading Report - {date_str}'},
            'Body': {'Text': {'Data': report}}
        }
    )
    
    return {
        'statusCode': 200,
        'body': json.dumps({'report_sent': True})
    }
'''

# Display Lambda function examples
print("Market Data Fetcher Lambda:")
print("=" * 50)
print(LambdaFunctionGenerator.market_data_fetcher())

Exercise 17.3: Lambda Configuration Builder (Guided)

Create a function that builds Lambda function configurations with memory and timeout settings.

Exercise
Click for solution
def build_lambda_config(functions: List[Dict]) -> Dict:
    """
    Build Lambda function configurations with resource allocation.

    Args:
        functions: List of function specs

    Returns:
        Lambda configuration dict
    """
    memory_tiers = {
        'light': 128,
        'standard': 512,
        'heavy': 1024,
        'compute': 2048
    }

    timeout_presets = {
        'quick': 30,
        'standard': 60,
        'long': 300,
        'max': 900
    }

    configs = {}
    total_memory = 0

    for func in functions:
        name = func['name']
        mem_tier = func.get('memory', 'standard')
        timeout_type = func.get('timeout', 'standard')

        memory = memory_tiers.get(mem_tier, 512)
        timeout = timeout_presets[timeout_type]

        config = {
            'function_name': name,
            'runtime': 'python3.10',
            'handler': f'{name}.handler',
            'memory_size': memory,
            'timeout': timeout,
            'environment': func.get('env', {})
        }

        if 'schedule' in func:
            config['trigger'] = {
                'type': 'schedule',
                'expression': func['schedule']
            }

        configs[name] = config
        total_memory += memory

    return {
        'functions': configs,
        'count': len(configs),
        'total_memory_mb': total_memory
    }
# Generate CloudFormation template for Lambda deployment
class CloudFormationGenerator:
    """
    Generate CloudFormation templates for serverless deployment.
    """
    
    @staticmethod
    def lambda_stack(function_name: str, runtime: str = 'python3.10',
                     memory: int = 256, timeout: int = 60) -> dict:
        """Generate CloudFormation template for Lambda function."""
        return {
            'AWSTemplateFormatVersion': '2010-09-09',
            'Description': f'Lambda function: {function_name}',
            'Parameters': {
                'Environment': {
                    'Type': 'String',
                    'Default': 'dev',
                    'AllowedValues': ['dev', 'staging', 'prod']
                }
            },
            'Resources': {
                'LambdaExecutionRole': {
                    'Type': 'AWS::IAM::Role',
                    'Properties': {
                        'AssumeRolePolicyDocument': {
                            'Version': '2012-10-17',
                            'Statement': [{
                                'Effect': 'Allow',
                                'Principal': {'Service': 'lambda.amazonaws.com'},
                                'Action': 'sts:AssumeRole'
                            }]
                        },
                        'ManagedPolicyArns': [
                            'arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'
                        ]
                    }
                },
                'LambdaFunction': {
                    'Type': 'AWS::Lambda::Function',
                    'Properties': {
                        'FunctionName': {'Fn::Sub': f'{function_name}-${{Environment}}'},
                        'Runtime': runtime,
                        'Handler': 'handler.handler',
                        'MemorySize': memory,
                        'Timeout': timeout,
                        'Role': {'Fn::GetAtt': ['LambdaExecutionRole', 'Arn']},
                        'Environment': {
                            'Variables': {
                                'ENVIRONMENT': {'Ref': 'Environment'}
                            }
                        }
                    }
                },
                'ScheduleRule': {
                    'Type': 'AWS::Events::Rule',
                    'Properties': {
                        'ScheduleExpression': 'rate(5 minutes)',
                        'State': 'ENABLED',
                        'Targets': [{
                            'Id': 'LambdaTarget',
                            'Arn': {'Fn::GetAtt': ['LambdaFunction', 'Arn']}
                        }]
                    }
                }
            },
            'Outputs': {
                'FunctionArn': {
                    'Value': {'Fn::GetAtt': ['LambdaFunction', 'Arn']}
                }
            }
        }

cf_template = CloudFormationGenerator.lambda_stack(
    'market-data-fetcher',
    memory=512,
    timeout=120
)

print("CloudFormation Template:")
print("=" * 50)
print(yaml.dump(cf_template, default_flow_style=False))

Section 17.4: CI/CD Pipelines

CI/CD (Continuous Integration/Continuous Deployment) automates testing and deployment.

Pipeline Stages

  1. Build: Compile code, install dependencies
  2. Test: Run unit tests, integration tests
  3. Analyze: Code quality, security scanning
  4. Deploy: Push to staging/production
class CICDPipelineGenerator:
    """
    Generate CI/CD pipeline configurations.
    """
    
    @staticmethod
    def github_actions_workflow() -> str:
        """Generate GitHub Actions workflow for trading application."""
        workflow = {
            'name': 'Trading System CI/CD',
            'on': {
                'push': {'branches': ['main', 'develop']},
                'pull_request': {'branches': ['main']}
            },
            'env': {
                'PYTHON_VERSION': '3.10',
                'AWS_REGION': 'us-east-1'
            },
            'jobs': {
                'test': {
                    'runs-on': 'ubuntu-latest',
                    'steps': [
                        {'uses': 'actions/checkout@v4'},
                        {
                            'name': 'Set up Python',
                            'uses': 'actions/setup-python@v4',
                            'with': {'python-version': '${{ env.PYTHON_VERSION }}'}
                        },
                        {
                            'name': 'Install dependencies',
                            'run': 'pip install -r requirements.txt -r requirements-dev.txt'
                        },
                        {
                            'name': 'Run linting',
                            'run': 'flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics'
                        },
                        {
                            'name': 'Run tests',
                            'run': 'pytest tests/ -v --cov=src --cov-report=xml'
                        },
                        {
                            'name': 'Upload coverage',
                            'uses': 'codecov/codecov-action@v3',
                            'with': {'files': 'coverage.xml'}
                        }
                    ]
                },
                'build': {
                    'needs': 'test',
                    'runs-on': 'ubuntu-latest',
                    'if': "github.ref == 'refs/heads/main'",
                    'steps': [
                        {'uses': 'actions/checkout@v4'},
                        {
                            'name': 'Configure AWS credentials',
                            'uses': 'aws-actions/configure-aws-credentials@v4',
                            'with': {
                                'aws-access-key-id': '${{ secrets.AWS_ACCESS_KEY_ID }}',
                                'aws-secret-access-key': '${{ secrets.AWS_SECRET_ACCESS_KEY }}',
                                'aws-region': '${{ env.AWS_REGION }}'
                            }
                        },
                        {
                            'name': 'Login to ECR',
                            'uses': 'aws-actions/amazon-ecr-login@v2'
                        },
                        {
                            'name': 'Build and push Docker image',
                            'run': '''|
                                docker build -t trading-system .
                                docker tag trading-system:latest ${{ secrets.ECR_REGISTRY }}/trading-system:${{ github.sha }}
                                docker push ${{ secrets.ECR_REGISTRY }}/trading-system:${{ github.sha }}
                            '''
                        }
                    ]
                },
                'deploy-staging': {
                    'needs': 'build',
                    'runs-on': 'ubuntu-latest',
                    'environment': 'staging',
                    'steps': [
                        {'uses': 'actions/checkout@v4'},
                        {
                            'name': 'Deploy to ECS Staging',
                            'run': 'aws ecs update-service --cluster staging --service trading-system --force-new-deployment'
                        }
                    ]
                },
                'deploy-production': {
                    'needs': 'deploy-staging',
                    'runs-on': 'ubuntu-latest',
                    'environment': 'production',
                    'steps': [
                        {'uses': 'actions/checkout@v4'},
                        {
                            'name': 'Deploy to ECS Production',
                            'run': 'aws ecs update-service --cluster production --service trading-system --force-new-deployment'
                        }
                    ]
                }
            }
        }
        
        return yaml.dump(workflow, default_flow_style=False, sort_keys=False)
    
    @staticmethod
    def pre_commit_config() -> str:
        """Generate pre-commit configuration."""
        config = {
            'repos': [
                {
                    'repo': 'https://github.com/pre-commit/pre-commit-hooks',
                    'rev': 'v4.4.0',
                    'hooks': [
                        {'id': 'trailing-whitespace'},
                        {'id': 'end-of-file-fixer'},
                        {'id': 'check-yaml'},
                        {'id': 'check-json'},
                        {'id': 'check-merge-conflict'}
                    ]
                },
                {
                    'repo': 'https://github.com/psf/black',
                    'rev': '23.9.1',
                    'hooks': [{'id': 'black', 'language_version': 'python3.10'}]
                },
                {
                    'repo': 'https://github.com/PyCQA/flake8',
                    'rev': '6.1.0',
                    'hooks': [{'id': 'flake8'}]
                },
                {
                    'repo': 'https://github.com/PyCQA/isort',
                    'rev': '5.12.0',
                    'hooks': [{'id': 'isort'}]
                },
                {
                    'repo': 'local',
                    'hooks': [
                        {
                            'id': 'pytest',
                            'name': 'pytest',
                            'entry': 'pytest tests/ -x',
                            'language': 'system',
                            'pass_filenames': False,
                            'always_run': True
                        }
                    ]
                }
            ]
        }
        
        return yaml.dump(config, default_flow_style=False)

print("GitHub Actions Workflow:")
print("=" * 50)
print(CICDPipelineGenerator.github_actions_workflow())
print("Pre-commit Configuration:")
print("=" * 50)
print(CICDPipelineGenerator.pre_commit_config())

Exercise 17.4: Complete Infrastructure Generator (Open-ended)

Build a class that generates complete infrastructure configurations for a trading platform, including cloud architecture, Docker setup, and CI/CD pipeline.

Exercise
Click for solution
class InfrastructureGenerator:
    """
    Generate complete infrastructure configurations for trading platforms.
    """

    def __init__(self, project_name: str, provider: str = 'aws'):
        self.project_name = project_name
        self.provider = provider
        self.services = []
        self.environments = ['dev', 'staging', 'prod']

    def add_service(self, name: str, service_type: str, tier: str = 'medium'):
        """Add a service to the infrastructure."""
        self.services.append({
            'name': name,
            'type': service_type,
            'tier': tier
        })

    def generate_architecture(self) -> Dict:
        """Generate cloud architecture configuration."""
        costs = {
            'compute': {'small': 50, 'medium': 150, 'large': 400},
            'serverless': {'small': 10, 'medium': 50, 'large': 200},
            'database': {'small': 30, 'medium': 100, 'large': 500},
            'storage': {'small': 5, 'medium': 25, 'large': 100},
            'queue': {'small': 1, 'medium': 10, 'large': 50},
        }

        architecture = {
            'provider': self.provider,
            'project': self.project_name,
            'services': [],
            'total_cost': 0
        }

        for svc in self.services:
            cost = costs.get(svc['type'], {}).get(svc['tier'], 50)
            architecture['services'].append({
                **svc,
                'estimated_cost': cost
            })
            architecture['total_cost'] += cost

        return architecture

    def generate_docker_compose(self) -> str:
        """Generate Docker Compose configuration."""
        compose = DockerComposeGenerator(self.project_name)

        for svc in self.services:
            if svc['type'] == 'database':
                compose.add_service(
                    name=svc['name'],
                    image='postgres:15',
                    ports=['5432:5432'],
                    environment={'POSTGRES_PASSWORD': '${DB_PASSWORD}'}
                )
            elif svc['type'] == 'storage':
                compose.add_service(
                    name=svc['name'],
                    image='redis:7-alpine',
                    ports=['6379:6379']
                )
            else:
                compose.add_service(
                    name=svc['name'],
                    build=f"./{svc['name']}"
                )

        return compose.generate()

    def generate_cicd(self) -> str:
        """Generate CI/CD pipeline configuration."""
        return CICDPipelineGenerator.github_actions_workflow()

    def generate_env_template(self) -> str:
        """Generate environment variable template."""
        lines = ['# Environment Configuration', '']

        for svc in self.services:
            lines.append(f'# {svc["name"].upper()}')
            if svc['type'] == 'database':
                lines.append('DATABASE_URL=postgresql://user:password@localhost:5432/db')
                lines.append('DB_PASSWORD=changeme')
            elif svc['type'] == 'storage':
                lines.append('REDIS_URL=redis://localhost:6379')
            lines.append('')

        lines.extend([
            '# API Keys',
            'MARKET_DATA_API_KEY=your_key',
            'BROKER_API_KEY=your_key',
            '',
            '# Application',
            'ENVIRONMENT=development',
            'SECRET_KEY=changeme'
        ])

        return '\n'.join(lines)

    def generate_all(self) -> Dict[str, str]:
        """Generate all infrastructure files."""
        return {
            'architecture.json': json.dumps(self.generate_architecture(), indent=2),
            'docker-compose.yml': self.generate_docker_compose(),
            '.github/workflows/ci-cd.yml': self.generate_cicd(),
            '.env.example': self.generate_env_template()
        }

# Test
infra = InfrastructureGenerator('QuantTradingPlatform', 'aws')
infra.add_service('api', 'compute', 'medium')
infra.add_service('strategy-engine', 'compute', 'large')
infra.add_service('postgres', 'database', 'medium')
infra.add_service('redis', 'storage', 'small')
infra.add_service('data-fetcher', 'serverless', 'medium')

files = infra.generate_all()
print(f"Generated {len(files)} files:")
for filename in files.keys():
    print(f"  - {filename}")

arch = infra.generate_architecture()
print(f"\nTotal Monthly Cost: ${arch['total_cost']}")

Exercise 17.5: Multi-Environment Deployer (Open-ended)

Create a deployment manager that handles multiple environments with proper configuration isolation.

Exercise
Click for solution
class MultiEnvironmentDeployer:
    """
    Manage deployments across multiple environments.
    """

    def __init__(self, project_name: str):
        self.project_name = project_name
        self.environments = {}
        self.services = {}

        # Tier scaling by environment
        self.tier_mapping = {
            'dev': {'small': 'small', 'medium': 'small', 'large': 'medium'},
            'staging': {'small': 'small', 'medium': 'medium', 'large': 'medium'},
            'prod': {'small': 'medium', 'medium': 'medium', 'large': 'large'}
        }

    def add_environment(self, name: str, region: str, config: Dict = None):
        """Add an environment."""
        self.environments[name] = {
            'name': name,
            'region': region,
            'config': config or {},
            'services': {}
        }

    def add_service(self, name: str, service_type: str, base_tier: str = 'medium'):
        """Add a service (applies to all environments with tier scaling)."""
        self.services[name] = {
            'type': service_type,
            'base_tier': base_tier
        }

    def get_environment_config(self, env_name: str) -> Dict:
        """Get configuration for specific environment."""
        if env_name not in self.environments:
            raise ValueError(f"Unknown environment: {env_name}")

        env = self.environments[env_name]
        tier_map = self.tier_mapping.get(env_name, self.tier_mapping['dev'])

        config = {
            'environment': env_name,
            'region': env['region'],
            'services': {}
        }

        for svc_name, svc_config in self.services.items():
            scaled_tier = tier_map[svc_config['base_tier']]
            config['services'][svc_name] = {
                'type': svc_config['type'],
                'tier': scaled_tier
            }

        return config

    def generate_terraform(self, env_name: str) -> str:
        """Generate Terraform for specific environment."""
        config = self.get_environment_config(env_name)

        tf_lines = [
            f'# Terraform configuration for {self.project_name}',
            f'# Environment: {env_name}',
            '',
            'provider "aws" {',
            f'  region = "{config["region"]}"',
            '}',
            '',
            'locals {',
            f'  environment = "{env_name}"',
            f'  project     = "{self.project_name}"',
            '}',
            ''
        ]

        for svc_name, svc_config in config['services'].items():
            tf_lines.append(f'# {svc_name}')
            tf_lines.append(f'# Type: {svc_config["type"]}, Tier: {svc_config["tier"]}')
            tf_lines.append('')

        return '\n'.join(tf_lines)

    def validate_consistency(self) -> Dict:
        """Validate configuration consistency."""
        issues = []

        # Check all environments have required services
        for env_name in self.environments:
            config = self.get_environment_config(env_name)
            if len(config['services']) != len(self.services):
                issues.append(f"{env_name}: service count mismatch")

        # Check tier scaling makes sense
        for svc_name, svc_config in self.services.items():
            dev_tier = self.tier_mapping['dev'][svc_config['base_tier']]
            prod_tier = self.tier_mapping['prod'][svc_config['base_tier']]

            tier_order = ['small', 'medium', 'large']
            if tier_order.index(dev_tier) > tier_order.index(prod_tier):
                issues.append(f"{svc_name}: dev tier > prod tier")

        return {
            'valid': len(issues) == 0,
            'issues': issues,
            'environments_checked': list(self.environments.keys()),
            'services_checked': list(self.services.keys())
        }

    def estimate_costs(self) -> Dict:
        """Estimate costs per environment."""
        costs = {
            'compute': {'small': 50, 'medium': 150, 'large': 400},
            'serverless': {'small': 10, 'medium': 50, 'large': 200},
            'database': {'small': 30, 'medium': 100, 'large': 500},
            'storage': {'small': 5, 'medium': 25, 'large': 100},
        }

        estimates = {}

        for env_name in self.environments:
            config = self.get_environment_config(env_name)
            total = 0

            for svc_config in config['services'].values():
                svc_costs = costs.get(svc_config['type'], costs['compute'])
                total += svc_costs.get(svc_config['tier'], 50)

            estimates[env_name] = total

        estimates['total'] = sum(estimates.values())
        return estimates

# Test
deployer = MultiEnvironmentDeployer('TradingPlatform')

# Add environments
deployer.add_environment('dev', 'us-east-1')
deployer.add_environment('staging', 'us-east-1')
deployer.add_environment('prod', 'us-east-1')

# Add services
deployer.add_service('api', 'compute', 'medium')
deployer.add_service('strategy', 'compute', 'large')
deployer.add_service('database', 'database', 'medium')

# Validate
validation = deployer.validate_consistency()
print(f"Configuration Valid: {validation['valid']}")

# Estimate costs
costs = deployer.estimate_costs()
print(f"\nMonthly Cost Estimates:")
for env, cost in costs.items():
    print(f"  {env}: ${cost}")

Exercise 17.6: Deployment Health Checker (Open-ended)

Create a system that monitors deployment health and generates status reports.

Exercise
Click for solution
import random
from datetime import datetime, timedelta

class DeploymentHealthChecker:
    """
    Monitor deployment health and generate reports.
    """

    def __init__(self, project_name: str):
        self.project_name = project_name
        self.services = {}
        self.health_history = []

    def add_service(self, name: str, endpoint: str, critical: bool = False):
        """Register a service for health monitoring."""
        self.services[name] = {
            'endpoint': endpoint,
            'critical': critical,
            'checks': []
        }

    def simulate_health_check(self, service_name: str) -> Dict:
        """Simulate a health check (normally would make HTTP request)."""
        # Simulate realistic response times and occasional failures
        is_healthy = random.random() > 0.05  # 95% success rate
        response_time = random.uniform(10, 200) if is_healthy else None

        result = {
            'timestamp': datetime.now().isoformat(),
            'service': service_name,
            'healthy': is_healthy,
            'response_time_ms': response_time,
            'status_code': 200 if is_healthy else 500
        }

        if service_name in self.services:
            self.services[service_name]['checks'].append(result)

        return result

    def run_all_checks(self) -> Dict:
        """Run health checks on all services."""
        results = []
        critical_failures = []

        for name, config in self.services.items():
            result = self.simulate_health_check(name)
            results.append(result)

            if not result['healthy'] and config['critical']:
                critical_failures.append(name)

        healthy_count = sum(1 for r in results if r['healthy'])

        summary = {
            'timestamp': datetime.now().isoformat(),
            'total_services': len(results),
            'healthy': healthy_count,
            'unhealthy': len(results) - healthy_count,
            'availability': healthy_count / len(results) * 100 if results else 0,
            'critical_failures': critical_failures,
            'overall_status': 'CRITICAL' if critical_failures else (
                'HEALTHY' if healthy_count == len(results) else 'DEGRADED'
            ),
            'details': results
        }

        self.health_history.append(summary)
        return summary

    def get_service_stats(self, service_name: str) -> Dict:
        """Get statistics for a specific service."""
        if service_name not in self.services:
            return {'error': 'Service not found'}

        checks = self.services[service_name]['checks']
        if not checks:
            return {'error': 'No health checks recorded'}

        healthy_checks = [c for c in checks if c['healthy']]
        response_times = [c['response_time_ms'] for c in healthy_checks if c['response_time_ms']]

        return {
            'service': service_name,
            'total_checks': len(checks),
            'successful': len(healthy_checks),
            'availability': len(healthy_checks) / len(checks) * 100,
            'avg_response_time': sum(response_times) / len(response_times) if response_times else None,
            'min_response_time': min(response_times) if response_times else None,
            'max_response_time': max(response_times) if response_times else None
        }

    def generate_report(self) -> str:
        """Generate health status report."""
        latest = self.run_all_checks()

        report_lines = [
            f"DEPLOYMENT HEALTH REPORT - {self.project_name}",
            "=" * 50,
            f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
            "",
            f"Overall Status: {latest['overall_status']}",
            f"Availability: {latest['availability']:.1f}%",
            f"Services: {latest['healthy']}/{latest['total_services']} healthy",
            ""
        ]

        if latest['critical_failures']:
            report_lines.append("CRITICAL FAILURES:")
            for svc in latest['critical_failures']:
                report_lines.append(f"  - {svc} [IMMEDIATE ACTION REQUIRED]")
            report_lines.append("")

        report_lines.append("Service Details:")
        for detail in latest['details']:
            status = "OK" if detail['healthy'] else "FAIL"
            rt = f"{detail['response_time_ms']:.0f}ms" if detail['response_time_ms'] else "N/A"
            report_lines.append(f"  [{status}] {detail['service']}: {rt}")

        # Add recommendations
        report_lines.extend(["", "Recommendations:"])
        for detail in latest['details']:
            if not detail['healthy']:
                report_lines.append(f"  - Investigate {detail['service']}: check logs, restart if needed")
            elif detail['response_time_ms'] and detail['response_time_ms'] > 150:
                report_lines.append(f"  - {detail['service']}: high latency, consider scaling")

        return "\n".join(report_lines)

# Test
checker = DeploymentHealthChecker('TradingPlatform')
checker.add_service('api', 'http://api:8000/health', critical=True)
checker.add_service('strategy-engine', 'http://strategy:8001/health', critical=True)
checker.add_service('database', 'http://postgres:5432', critical=True)
checker.add_service('redis', 'http://redis:6379/ping', critical=False)
checker.add_service('dashboard', 'http://dashboard:8050/health', critical=False)

# Run multiple checks
for _ in range(5):
    checker.run_all_checks()

# Generate report
print(checker.generate_report())

Module Project: Cloud Deployment Template

Build a complete deployment template generator for a trading system.

def generate_trading_requirements():
    """
    Generate requirements.txt for a trading application.
    """
    packages = [
        # Core data handling
        'pandas>=2.0.0',
        'numpy>=1.24.0',
        
        # Market data
        'yfinance>=0.2.0',
        
        # Web framework
        'fastapi>=0.100.0',
        'uvicorn>=0.23.0',
        'gunicorn>=21.0.0',
        
        # Database
        'sqlalchemy>=2.0.0',
        'psycopg2-binary>=2.9.0',
        'alembic>=1.12.0',
        
        # Visualization
        'plotly>=5.15.0',
        'dash>=2.14.0',
        
        # Utilities
        'python-dotenv>=1.0.0',
        'pydantic>=2.0.0',
        'redis>=4.6.0',
        
        # Scheduling
        'schedule>=1.2.0',
        'celery>=5.3.0',
        
        # Scientific computing
        'scipy>=1.11.0',
    ]
    
    return '\n'.join(packages)


class CloudDeploymentTemplate:
    """
    Complete cloud deployment template generator.
    
    Features:
    - Multi-environment support
    - Container configuration
    - Infrastructure as code
    - CI/CD pipeline
    """
    
    def __init__(self, project_name: str, provider: str = 'aws'):
        self.project_name = project_name
        self.provider = provider
        self.architecture = CloudArchitecture(project_name, provider)
        self.files = {}
    
    def add_standard_services(self):
        """Add standard services for a trading system."""
        # Core services
        self.architecture.add_service("api-server", "compute", "medium")
        self.architecture.add_service("strategy-engine", "compute", "medium")
        self.architecture.add_service("data-processor", "serverless", "medium")
        self.architecture.add_service("database", "database", "medium")
        self.architecture.add_service("cache", "storage", "small")
        self.architecture.add_service("message-queue", "queue", "medium")
        
        # Connections
        self.architecture.connect("data-processor", "message-queue")
        self.architecture.connect("message-queue", "strategy-engine")
        self.architecture.connect("strategy-engine", "database")
        self.architecture.connect("api-server", "database")
        self.architecture.connect("api-server", "cache")
    
    def generate_all_files(self) -> Dict[str, str]:
        """Generate all deployment files."""
        # Dockerfile
        self.files['Dockerfile'] = DockerfileGenerator.generate(
            'python-api', port=8000, entrypoint='main'
        )
        
        # Docker Compose
        compose = DockerComposeGenerator(self.project_name)
        compose.add_service('api', build='.', ports=['8000:8000'],
                           environment={'DATABASE_URL': '${DATABASE_URL}'})
        compose.add_service('postgres', image='postgres:15',
                           environment={'POSTGRES_PASSWORD': '${DB_PASSWORD}'})
        compose.add_service('redis', image='redis:7-alpine')
        self.files['docker-compose.yml'] = compose.generate()
        
        # Requirements
        self.files['requirements.txt'] = generate_trading_requirements()
        
        # GitHub Actions
        self.files['.github/workflows/ci-cd.yml'] = CICDPipelineGenerator.github_actions_workflow()
        
        # Pre-commit
        self.files['.pre-commit-config.yaml'] = CICDPipelineGenerator.pre_commit_config()
        
        # Terraform
        self.files['terraform/main.tf'] = self.architecture.generate_terraform()
        
        # Environment template
        self.files['.env.example'] = self._generate_env_template()
        
        # Makefile
        self.files['Makefile'] = self._generate_makefile()
        
        # README
        self.files['README.md'] = self._generate_readme()
        
        return self.files
    
    def _generate_env_template(self) -> str:
        """Generate environment variable template."""
        return '''# Database
DATABASE_URL=postgresql://user:password@localhost:5432/trading
DB_PASSWORD=your_secure_password

# Redis
REDIS_URL=redis://localhost:6379

# API Keys
MARKET_DATA_API_KEY=your_api_key
BROKER_API_KEY=your_broker_key
BROKER_SECRET=your_broker_secret

# AWS (for deployment)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1

# Application
ENVIRONMENT=development
DEBUG=true
SECRET_KEY=your_secret_key_for_jwt
'''
    
    def _generate_makefile(self) -> str:
        """Generate Makefile for common commands."""
        return f'''# Makefile for {self.project_name}

.PHONY: install test lint run docker-build docker-up docker-down deploy

install:
\tpip install -r requirements.txt
\tpre-commit install

test:
\tpytest tests/ -v --cov=src

lint:
\tflake8 src/ tests/
\tblack --check src/ tests/

format:
\tblack src/ tests/
\tisort src/ tests/

run:
\tuvicorn main:app --reload --port 8000

docker-build:
\tdocker-compose build

docker-up:
\tdocker-compose up -d

docker-down:
\tdocker-compose down

docker-logs:
\tdocker-compose logs -f

deploy-staging:
\tcd terraform && terraform workspace select staging
\tcd terraform && terraform apply -auto-approve

deploy-prod:
\tcd terraform && terraform workspace select production
\tcd terraform && terraform apply
'''
    
    def _generate_readme(self) -> str:
        """Generate README documentation."""
        return f'''# {self.project_name}

Quantitative trading system deployed on {self.provider.upper()}.

## Quick Start

```bash
# Clone repository
git clone <repo-url>
cd {self.project_name.lower().replace(" ", "-")}

# Setup environment
cp .env.example .env
make install

# Run locally
make docker-up
```

## Architecture

- **API Server**: REST API for client access
- **Strategy Engine**: Executes trading strategies
- **Data Processor**: Collects and processes market data
- **Database**: PostgreSQL for persistent storage
- **Cache**: Redis for high-speed data access

## Deployment

```bash
# Deploy to staging
make deploy-staging

# Deploy to production
make deploy-prod
```

## Development

```bash
# Run tests
make test

# Lint code
make lint

# Format code
make format
```

## Estimated Monthly Cost

${self.architecture.total_estimated_cost()}/month (varies with usage)
'''
    
    def display_summary(self):
        """Display deployment template summary."""
        print(f"Cloud Deployment Template: {self.project_name}")
        print("=" * 60)
        print()
        
        print("Generated Files:")
        for filename in self.files.keys():
            print(f"  - {filename}")
        print()
        
        self.architecture.display_architecture()

# Generate complete deployment template
template = CloudDeploymentTemplate("Quant Trading Platform", provider='aws')
template.add_standard_services()
files = template.generate_all_files()

template.display_summary()
# Display generated Makefile
print("\nGenerated Makefile:")
print("=" * 50)
print(files['Makefile'])

Key Takeaways

Cloud Architecture

  • Choose provider based on your needs (AWS for breadth, GCP for ML, Azure for enterprise)
  • Design for scalability and fault tolerance
  • Use Infrastructure as Code (Terraform, CloudFormation)

Containerization

  • Docker ensures consistent environments
  • Use multi-stage builds to reduce image size
  • Docker Compose for local development

Serverless

  • Ideal for event-driven workloads
  • Pay only for execution time
  • Cold starts can add latency

CI/CD

  • Automate testing and deployment
  • Use pre-commit hooks for code quality
  • Deploy to staging before production

Best Practices

  1. Never commit secrets - use environment variables
  2. Tag Docker images with git commit SHA
  3. Run tests before every deployment
  4. Monitor costs and set billing alerts
  5. Use multiple environments (dev, staging, prod)

Next: Module 18 - Performance Monitoring

Module 18: 24/7 Operation

Part 5: Production & Infrastructure

Duration ~2.5 hours
Exercises 6 (3 guided + 3 open-ended)

Learning Objectives

By the end of this module, you will be able to: - Implement comprehensive system monitoring with metrics collection - Design and configure alerting systems for trading operations - Manage incidents with structured response processes - Build backup and recovery strategies for critical data


# Environment setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Callable
from enum import Enum
import random
import json
import warnings
warnings.filterwarnings('ignore')

print("Module 18: 24/7 Operation")
print("=" * 40)

Section 18.1: System Monitoring

You can't fix what you can't see. Monitoring is the foundation of reliable operations.

What to Monitor

Category Metrics Why It Matters
Infrastructure CPU, memory, disk, network Resource exhaustion
Application Latency, errors, throughput User experience
Business Orders executed, PnL, positions Trading outcomes
Dependencies Database, API, broker External failures

The Four Golden Signals (SRE)

  1. Latency: How long requests take
  2. Traffic: How many requests you're handling
  3. Errors: Rate of failed requests
  4. Saturation: How "full" your service is
class MetricType(Enum):
    GAUGE = "gauge"      # Current value (e.g., CPU usage)
    COUNTER = "counter"  # Cumulative count (e.g., total requests)
    HISTOGRAM = "histogram"  # Distribution (e.g., latency)

@dataclass
class Metric:
    """A single metric measurement."""
    name: str
    value: float
    timestamp: datetime
    labels: Dict[str, str] = field(default_factory=dict)
    metric_type: MetricType = MetricType.GAUGE

class MetricsCollector:
    """
    Collects and stores system metrics.
    """
    
    def __init__(self, retention_hours: int = 24):
        self.retention_hours = retention_hours
        self.metrics: Dict[str, List[Metric]] = {}
        self.counters: Dict[str, float] = {}
    
    def record(self, name: str, value: float, labels: Dict = None,
               metric_type: MetricType = MetricType.GAUGE):
        """Record a metric value."""
        metric = Metric(
            name=name,
            value=value,
            timestamp=datetime.now(),
            labels=labels or {},
            metric_type=metric_type
        )
        
        if name not in self.metrics:
            self.metrics[name] = []
        
        self.metrics[name].append(metric)
        self._cleanup(name)
    
    def increment(self, name: str, value: float = 1.0):
        """Increment a counter metric."""
        if name not in self.counters:
            self.counters[name] = 0
        self.counters[name] += value
        self.record(name, self.counters[name], metric_type=MetricType.COUNTER)
    
    def _cleanup(self, name: str):
        """Remove old metrics beyond retention period."""
        cutoff = datetime.now() - timedelta(hours=self.retention_hours)
        self.metrics[name] = [
            m for m in self.metrics[name] if m.timestamp > cutoff
        ]
    
    def get_latest(self, name: str) -> Optional[float]:
        """Get the most recent value for a metric."""
        if name not in self.metrics or not self.metrics[name]:
            return None
        return self.metrics[name][-1].value
    
    def get_series(self, name: str, hours: float = 1.0) -> pd.DataFrame:
        """Get time series data for a metric."""
        if name not in self.metrics:
            return pd.DataFrame()
        
        cutoff = datetime.now() - timedelta(hours=hours)
        data = [
            {'timestamp': m.timestamp, 'value': m.value}
            for m in self.metrics[name]
            if m.timestamp > cutoff
        ]
        
        return pd.DataFrame(data)
    
    def get_stats(self, name: str, hours: float = 1.0) -> Dict:
        """Get statistics for a metric over time period."""
        series = self.get_series(name, hours)
        
        if series.empty:
            return {}
        
        values = series['value'].values
        
        return {
            'min': float(np.min(values)),
            'max': float(np.max(values)),
            'mean': float(np.mean(values)),
            'std': float(np.std(values)),
            'p50': float(np.percentile(values, 50)),
            'p95': float(np.percentile(values, 95)),
            'p99': float(np.percentile(values, 99)),
            'count': len(values)
        }

class HealthChecker:
    """
    Performs health checks on system components.
    """
    
    def __init__(self):
        self.checks: Dict[str, Callable] = {}
        self.results: Dict[str, Dict] = {}
    
    def register_check(self, name: str, check_func: Callable):
        """Register a health check function."""
        self.checks[name] = check_func
    
    def run_checks(self) -> Dict[str, Dict]:
        """Run all registered health checks."""
        self.results = {}
        
        for name, check_func in self.checks.items():
            start_time = datetime.now()
            try:
                result = check_func()
                healthy = result.get('healthy', True)
                message = result.get('message', 'OK')
            except Exception as e:
                healthy = False
                message = str(e)
            
            duration = (datetime.now() - start_time).total_seconds() * 1000
            
            self.results[name] = {
                'healthy': healthy,
                'message': message,
                'duration_ms': duration,
                'timestamp': datetime.now().isoformat()
            }
        
        return self.results
    
    def is_healthy(self) -> bool:
        """Check if all components are healthy."""
        if not self.results:
            self.run_checks()
        return all(r['healthy'] for r in self.results.values())
    
    def get_status(self) -> Dict:
        """Get overall system status."""
        if not self.results:
            self.run_checks()
        
        healthy_count = sum(1 for r in self.results.values() if r['healthy'])
        total_count = len(self.results)
        
        return {
            'status': 'healthy' if self.is_healthy() else 'unhealthy',
            'healthy_checks': healthy_count,
            'total_checks': total_count,
            'timestamp': datetime.now().isoformat(),
            'checks': self.results
        }

# Create metrics collector and simulate data
collector = MetricsCollector()

# Simulate collecting metrics over time
np.random.seed(42)
base_time = datetime.now() - timedelta(hours=1)

for i in range(120):  # 2 hours of data at 1-minute intervals
    # Simulated metrics
    cpu = 30 + np.random.normal(0, 10) + (i % 20) * 0.5  # Periodic load
    memory = 60 + np.random.normal(0, 5)
    latency = 50 + np.random.exponential(20)  # Long tail
    
    collector.record('cpu_percent', max(0, min(100, cpu)))
    collector.record('memory_percent', max(0, min(100, memory)))
    collector.record('api_latency_ms', latency)
    collector.increment('requests_total', np.random.randint(10, 100))
    collector.increment('errors_total', np.random.choice([0, 0, 0, 1]))

# Display statistics
print("System Metrics Summary (Last Hour)")
print("=" * 50)

for metric in ['cpu_percent', 'memory_percent', 'api_latency_ms']:
    stats = collector.get_stats(metric)
    print(f"\n{metric}:")
    print(f"  Mean: {stats['mean']:.1f}")
    print(f"  P50:  {stats['p50']:.1f}")
    print(f"  P95:  {stats['p95']:.1f}")
    print(f"  P99:  {stats['p99']:.1f}")
# Set up health checks
health = HealthChecker()

# Simulated health check functions
def check_database():
    # Simulate database connectivity check
    return {'healthy': random.random() > 0.05, 'message': 'Database connected'}

def check_broker_api():
    # Simulate broker API check
    return {'healthy': random.random() > 0.1, 'message': 'Broker API responding'}

def check_market_data():
    # Simulate market data feed check
    return {'healthy': random.random() > 0.02, 'message': 'Market data streaming'}

def check_disk_space():
    # Simulate disk space check
    usage = random.uniform(40, 80)
    return {
        'healthy': usage < 90,
        'message': f'Disk usage: {usage:.1f}%'
    }

health.register_check('database', check_database)
health.register_check('broker_api', check_broker_api)
health.register_check('market_data', check_market_data)
health.register_check('disk_space', check_disk_space)

# Run health checks
status = health.get_status()

print("\nHealth Check Results")
print("=" * 50)
print(f"Overall Status: {status['status'].upper()}")
print(f"Healthy: {status['healthy_checks']}/{status['total_checks']}")
print()

for name, result in status['checks'].items():
    status_icon = "✓" if result['healthy'] else "✗"
    print(f"  {status_icon} {name}: {result['message']} ({result['duration_ms']:.1f}ms)")

Exercise 18.1: Position Risk Health Check (Guided)

Create a health check that monitors position risk and returns unhealthy if any position exceeds a maximum size.

Exercise
Click for solution
def check_position_risk(positions: Dict[str, float], max_position_pct: float = 0.20) -> Dict:
    """
    Check if any position exceeds maximum allowed size.

    Args:
        positions: Dict of {symbol: position_value}
        max_position_pct: Maximum allowed position as % of total

    Returns:
        Health check result dict
    """
    if not positions:
        return {'healthy': True, 'message': 'No positions'}

    total_value = sum(abs(v) for v in positions.values())

    if total_value == 0:
        return {'healthy': True, 'message': 'No positions'}

    violations = []

    for symbol, value in positions.items():
        position_pct = abs(value) / total_value

        if position_pct > max_position_pct:
            violations.append(f"{symbol}: {position_pct:.1%}")

    if violations:
        return {
            'healthy': False,
            'message': f"Position limits exceeded: {', '.join(violations)}"
        }

    return {
        'healthy': True,
        'message': f'All positions within {max_position_pct:.0%} limit'
    }

Section 18.2: Alerting Systems

Monitoring is useless if no one sees the problems. Alerting bridges that gap.

Alert Design Principles

  1. Actionable: Every alert should require action
  2. Urgent: Reserve alerts for time-sensitive issues
  3. Meaningful: Avoid alert fatigue from false positives
  4. Informative: Include enough context to diagnose
class AlertSeverity(Enum):
    INFO = "info"
    WARNING = "warning"
    CRITICAL = "critical"

@dataclass
class Alert:
    """Represents a system alert."""
    name: str
    severity: AlertSeverity
    message: str
    timestamp: datetime = field(default_factory=datetime.now)
    labels: Dict = field(default_factory=dict)
    resolved: bool = False
    resolved_at: Optional[datetime] = None

@dataclass
class AlertRule:
    """Defines when an alert should fire."""
    name: str
    condition: Callable  # Function that returns True if alert should fire
    severity: AlertSeverity
    message_template: str
    cooldown_minutes: int = 5  # Minimum time between alerts
    last_fired: Optional[datetime] = None

class AlertManager:
    """
    Manages alert rules, firing, and notification.
    """
    
    def __init__(self):
        self.rules: List[AlertRule] = []
        self.active_alerts: List[Alert] = []
        self.alert_history: List[Alert] = []
        self.notification_channels: List[Callable] = []
    
    def add_rule(self, name: str, condition: Callable, severity: AlertSeverity,
                 message_template: str, cooldown_minutes: int = 5):
        """Add an alert rule."""
        rule = AlertRule(
            name=name,
            condition=condition,
            severity=severity,
            message_template=message_template,
            cooldown_minutes=cooldown_minutes
        )
        self.rules.append(rule)
    
    def add_notification_channel(self, channel: Callable):
        """Add a notification channel (function that receives alerts)."""
        self.notification_channels.append(channel)
    
    def check_rules(self, context: Dict) -> List[Alert]:
        """
        Check all rules and fire alerts as needed.
        
        Parameters:
        -----------
        context : dict
            Current system state for rule evaluation
        
        Returns:
        --------
        List[Alert] : New alerts that were fired
        """
        new_alerts = []
        now = datetime.now()
        
        for rule in self.rules:
            # Check cooldown
            if rule.last_fired:
                cooldown_end = rule.last_fired + timedelta(minutes=rule.cooldown_minutes)
                if now < cooldown_end:
                    continue
            
            # Evaluate condition
            try:
                should_fire = rule.condition(context)
            except Exception as e:
                should_fire = False
            
            if should_fire:
                # Create alert
                message = rule.message_template.format(**context)
                alert = Alert(
                    name=rule.name,
                    severity=rule.severity,
                    message=message
                )
                
                # Record and notify
                self.active_alerts.append(alert)
                self.alert_history.append(alert)
                new_alerts.append(alert)
                rule.last_fired = now
                
                # Send to notification channels
                for channel in self.notification_channels:
                    try:
                        channel(alert)
                    except Exception as e:
                        print(f"Notification failed: {e}")
        
        return new_alerts
    
    def resolve_alert(self, alert_name: str):
        """Resolve active alerts by name."""
        for alert in self.active_alerts:
            if alert.name == alert_name and not alert.resolved:
                alert.resolved = True
                alert.resolved_at = datetime.now()
        
        # Remove resolved alerts from active list
        self.active_alerts = [a for a in self.active_alerts if not a.resolved]
    
    def get_active_alerts(self) -> List[Alert]:
        """Get all active (unresolved) alerts."""
        return self.active_alerts
    
    def get_alert_summary(self) -> Dict:
        """Get summary of alert activity."""
        return {
            'active_count': len(self.active_alerts),
            'critical_count': sum(1 for a in self.active_alerts if a.severity == AlertSeverity.CRITICAL),
            'warning_count': sum(1 for a in self.active_alerts if a.severity == AlertSeverity.WARNING),
            'total_fired_24h': sum(
                1 for a in self.alert_history
                if a.timestamp > datetime.now() - timedelta(hours=24)
            ),
            'active_alerts': [
                {'name': a.name, 'severity': a.severity.value, 'message': a.message}
                for a in self.active_alerts
            ]
        }

# Create alert manager
alerts = AlertManager()

# Add notification channel (just print for demo)
def console_notification(alert: Alert):
    severity_emoji = {'info': 'ℹ️', 'warning': '⚠️', 'critical': '🚨'}
    emoji = severity_emoji.get(alert.severity.value, '📢')
    print(f"{emoji} [{alert.severity.value.upper()}] {alert.name}: {alert.message}")

alerts.add_notification_channel(console_notification)

# Add alert rules
alerts.add_rule(
    name='high_cpu',
    condition=lambda ctx: ctx.get('cpu_percent', 0) > 80,
    severity=AlertSeverity.WARNING,
    message_template='CPU usage at {cpu_percent:.1f}%',
    cooldown_minutes=5
)

alerts.add_rule(
    name='critical_cpu',
    condition=lambda ctx: ctx.get('cpu_percent', 0) > 95,
    severity=AlertSeverity.CRITICAL,
    message_template='CRITICAL: CPU at {cpu_percent:.1f}%!',
    cooldown_minutes=1
)

alerts.add_rule(
    name='high_error_rate',
    condition=lambda ctx: ctx.get('error_rate', 0) > 0.05,
    severity=AlertSeverity.WARNING,
    message_template='Error rate elevated: {error_rate:.1%}',
    cooldown_minutes=5
)

alerts.add_rule(
    name='large_drawdown',
    condition=lambda ctx: ctx.get('drawdown_pct', 0) > 0.10,
    severity=AlertSeverity.CRITICAL,
    message_template='Large drawdown: {drawdown_pct:.1%}',
    cooldown_minutes=15
)

print("Alert Rules Configured")
print("=" * 50)
for rule in alerts.rules:
    print(f"  {rule.severity.value.upper():8} {rule.name}")
# Simulate system states and check alerts
print("\nSimulating Alert Scenarios")
print("=" * 50)

# Normal state
print("\n1. Normal state:")
context_normal = {'cpu_percent': 45, 'error_rate': 0.01, 'drawdown_pct': 0.02}
new_alerts = alerts.check_rules(context_normal)
if not new_alerts:
    print("   No alerts fired")

# High CPU
print("\n2. High CPU:")
context_high_cpu = {'cpu_percent': 85, 'error_rate': 0.01, 'drawdown_pct': 0.02}
alerts.check_rules(context_high_cpu)

# Critical situation
print("\n3. Critical situation:")
context_critical = {'cpu_percent': 97, 'error_rate': 0.08, 'drawdown_pct': 0.12}
alerts.check_rules(context_critical)

# Summary
print("\nAlert Summary:")
summary = alerts.get_alert_summary()
print(f"  Active alerts: {summary['active_count']}")
print(f"  Critical: {summary['critical_count']}")
print(f"  Warning: {summary['warning_count']}")

Exercise 18.2: Alert Rule Builder (Guided)

Build a function that creates alert rules with proper threshold configuration.

Exercise
Click for solution
def create_threshold_alert(metric_name: str, warning_threshold: float,
                          critical_threshold: float, comparison: str = 'above') -> List[Dict]:
    """
    Create warning and critical alert rules for a metric.

    Args:
        metric_name: Name of the metric to monitor
        warning_threshold: Threshold for warning alerts
        critical_threshold: Threshold for critical alerts
        comparison: 'above' or 'below'

    Returns:
        List of alert rule configurations
    """
    rules = []

    if comparison == 'above':
        warning_condition = lambda ctx: ctx.get(metric_name, 0) > warning_threshold
        critical_condition = lambda ctx: ctx.get(metric_name, 0) > critical_threshold
    else:
        warning_condition = lambda ctx: ctx.get(metric_name, float('inf')) < warning_threshold
        critical_condition = lambda ctx: ctx.get(metric_name, float('inf')) < critical_threshold

    warning_rule = {
        'name': f'{metric_name}_warning',
        'condition': warning_condition,
        'severity': AlertSeverity.WARNING,
        'message_template': f'{metric_name} {{' + metric_name + f'}} {comparison} warning threshold ({warning_threshold})'
    }

    critical_rule = {
        'name': f'{metric_name}_critical',
        'condition': critical_condition,
        'severity': AlertSeverity.CRITICAL,
        'message_template': f'{metric_name} {{' + metric_name + f'}} {comparison} critical threshold ({critical_threshold})'
    }

    rules.append(warning_rule)
    rules.append(critical_rule)

    return rules

Section 18.3: Incident Response

When things go wrong (and they will), having a systematic response process is crucial.

Incident Lifecycle

  1. Detection: Alert fires or user reports issue
  2. Triage: Assess severity and impact
  3. Response: Follow runbook, mitigate impact
  4. Resolution: Fix the root cause
  5. Post-mortem: Learn and prevent recurrence
class IncidentSeverity(Enum):
    SEV1 = "sev1"  # Critical: Trading halted, major financial impact
    SEV2 = "sev2"  # High: Degraded performance, significant impact
    SEV3 = "sev3"  # Medium: Minor issues, limited impact
    SEV4 = "sev4"  # Low: Cosmetic issues, no financial impact

@dataclass
class Incident:
    """Represents a system incident."""
    id: str
    title: str
    severity: IncidentSeverity
    description: str
    created_at: datetime = field(default_factory=datetime.now)
    resolved_at: Optional[datetime] = None
    status: str = "open"  # open, investigating, mitigating, resolved
    timeline: List[Dict] = field(default_factory=list)
    affected_systems: List[str] = field(default_factory=list)
    impact: str = ""
    root_cause: str = ""
    resolution: str = ""

@dataclass
class Runbook:
    """A runbook for incident response."""
    name: str
    description: str
    symptoms: List[str]
    steps: List[Dict]  # {step: str, expected_outcome: str}
    escalation: str
    estimated_time_minutes: int

class IncidentManager:
    """
    Manages incident lifecycle and documentation.
    """
    
    def __init__(self):
        self.incidents: Dict[str, Incident] = {}
        self.runbooks: Dict[str, Runbook] = {}
        self._incident_counter = 0
    
    def create_incident(self, title: str, severity: IncidentSeverity,
                       description: str, affected_systems: List[str] = None) -> Incident:
        """Create a new incident."""
        self._incident_counter += 1
        incident_id = f"INC-{self._incident_counter:04d}"
        
        incident = Incident(
            id=incident_id,
            title=title,
            severity=severity,
            description=description,
            affected_systems=affected_systems or []
        )
        
        # Add creation to timeline
        incident.timeline.append({
            'time': datetime.now().isoformat(),
            'event': 'Incident created',
            'details': description
        })
        
        self.incidents[incident_id] = incident
        return incident
    
    def update_status(self, incident_id: str, new_status: str, notes: str = ""):
        """Update incident status."""
        if incident_id not in self.incidents:
            raise ValueError(f"Incident {incident_id} not found")
        
        incident = self.incidents[incident_id]
        old_status = incident.status
        incident.status = new_status
        
        incident.timeline.append({
            'time': datetime.now().isoformat(),
            'event': f'Status changed: {old_status} -> {new_status}',
            'details': notes
        })
        
        if new_status == 'resolved':
            incident.resolved_at = datetime.now()
    
    def add_timeline_entry(self, incident_id: str, event: str, details: str = ""):
        """Add an entry to the incident timeline."""
        if incident_id not in self.incidents:
            raise ValueError(f"Incident {incident_id} not found")
        
        self.incidents[incident_id].timeline.append({
            'time': datetime.now().isoformat(),
            'event': event,
            'details': details
        })
    
    def add_runbook(self, runbook: Runbook):
        """Add a runbook to the library."""
        self.runbooks[runbook.name] = runbook
    
    def find_runbook(self, symptoms: List[str]) -> Optional[Runbook]:
        """Find a runbook matching given symptoms."""
        for runbook in self.runbooks.values():
            matching = sum(1 for s in symptoms if any(s.lower() in rs.lower() for rs in runbook.symptoms))
            if matching > 0:
                return runbook
        return None
    
    def generate_postmortem(self, incident_id: str) -> str:
        """Generate a post-mortem report for an incident."""
        if incident_id not in self.incidents:
            raise ValueError(f"Incident {incident_id} not found")
        
        incident = self.incidents[incident_id]
        
        duration = "Ongoing"
        if incident.resolved_at:
            duration_mins = (incident.resolved_at - incident.created_at).total_seconds() / 60
            duration = f"{duration_mins:.0f} minutes"
        
        report = f"""
# Post-Mortem Report: {incident.id}

## Summary
- **Title**: {incident.title}
- **Severity**: {incident.severity.value.upper()}
- **Duration**: {duration}
- **Status**: {incident.status}

## Impact
{incident.impact or 'Not documented'}

## Affected Systems
{', '.join(incident.affected_systems) or 'Not specified'}

## Timeline
"""
        for entry in incident.timeline:
            report += f"- **{entry['time']}**: {entry['event']}\n"
            if entry.get('details'):
                report += f"  - {entry['details']}\n"
        
        report += f"""
## Root Cause
{incident.root_cause or 'Under investigation'}

## Resolution
{incident.resolution or 'Not yet resolved'}

## Action Items
- [ ] Document lessons learned
- [ ] Update monitoring/alerting
- [ ] Review and update runbooks
- [ ] Schedule follow-up review
"""
        return report

# Create incident manager and add runbooks
incidents = IncidentManager()

# Add some runbooks
runbook_db = Runbook(
    name='database_connection_failure',
    description='Steps to diagnose and recover from database connection issues',
    symptoms=['database connection', 'connection refused', 'timeout', 'postgres'],
    steps=[
        {'step': 'Check database server status', 'expected_outcome': 'Server should be running'},
        {'step': 'Verify network connectivity', 'expected_outcome': 'Ping should succeed'},
        {'step': 'Check connection pool exhaustion', 'expected_outcome': 'Connections < max_pool'},
        {'step': 'Review database logs', 'expected_outcome': 'Identify error messages'},
        {'step': 'Restart connection pool if needed', 'expected_outcome': 'Connections restored'},
    ],
    escalation='If not resolved in 15 minutes, page on-call DBA',
    estimated_time_minutes=30
)

runbook_api = Runbook(
    name='broker_api_failure',
    description='Steps to handle broker API failures',
    symptoms=['broker', 'api error', '401', '403', '500', 'order rejected'],
    steps=[
        {'step': 'Check broker status page', 'expected_outcome': 'Identify if broker-wide issue'},
        {'step': 'Verify API credentials', 'expected_outcome': 'Credentials should be valid'},
        {'step': 'Check rate limits', 'expected_outcome': 'Should be within limits'},
        {'step': 'Enable backup broker if available', 'expected_outcome': 'Orders route to backup'},
        {'step': 'Pause new order submission', 'expected_outcome': 'Prevent further failures'},
    ],
    escalation='Contact broker support and notify risk team',
    estimated_time_minutes=15
)

incidents.add_runbook(runbook_db)
incidents.add_runbook(runbook_api)

print("Runbooks Loaded:")
for name, rb in incidents.runbooks.items():
    print(f"  - {name}: {rb.description}")

Exercise 18.3: Backup Job Configuration (Guided)

Create a function that validates backup job configurations and calculates retention requirements.

Exercise
Click for solution
def validate_backup_config(jobs: List[Dict]) -> Dict:
    """
    Validate backup job configurations and calculate storage needs.

    Args:
        jobs: List of backup job configurations

    Returns:
        Validation results and storage estimates
    """
    results = {
        'valid': True,
        'errors': [],
        'warnings': [],
        'jobs': [],
        'total_daily_storage_gb': 0,
        'total_retention_storage_gb': 0
    }

    required_fields = ['name', 'frequency_hours', 'retention_days', 'estimated_size_gb']

    for job in jobs:
        job_result = {'name': job.get('name', 'unknown'), 'valid': True}

        for field in required_fields:
            if field not in job:
                job_result['valid'] = False
                results['errors'].append(f"{job_result['name']}: missing {field}")

        if not job_result['valid']:
            results['jobs'].append(job_result)
            continue

        backups_per_day = 24 / job['frequency_hours']

        daily_storage = backups_per_day * job['estimated_size_gb']
        job_result['daily_storage_gb'] = daily_storage

        retention_storage = daily_storage * job['retention_days']
        job_result['retention_storage_gb'] = retention_storage

        results['total_daily_storage_gb'] += daily_storage
        results['total_retention_storage_gb'] += retention_storage

        if job['retention_days'] < 7:
            results['warnings'].append(f"{job['name']}: retention less than 7 days")

        if backups_per_day < 1:
            results['warnings'].append(f"{job['name']}: backup frequency > 24 hours")

        results['jobs'].append(job_result)

    results['valid'] = len(results['errors']) == 0

    return results

Section 18.4: Backup & Recovery

Data is your most valuable asset. Losing trade history, positions, or configuration can be catastrophic.

Backup Strategy

Data Type Backup Frequency Retention Recovery Time Objective
Trade database Continuous (replication) 90 days < 1 hour
Configuration On change Forever < 15 minutes
Market data Daily 30 days < 4 hours
Logs Daily 7 days < 1 hour
@dataclass
class BackupJob:
    """Represents a backup job."""
    name: str
    source: str
    destination: str
    schedule: str  # cron expression or description
    retention_days: int
    last_run: Optional[datetime] = None
    last_status: str = "never_run"
    last_size_mb: float = 0

@dataclass
class Backup:
    """Represents a completed backup."""
    job_name: str
    timestamp: datetime
    path: str
    size_mb: float
    checksum: str
    metadata: Dict = field(default_factory=dict)

class BackupManager:
    """
    Manages backup jobs and recovery.
    """
    
    def __init__(self):
        self.jobs: Dict[str, BackupJob] = {}
        self.backups: List[Backup] = []
    
    def add_job(self, job: BackupJob):
        """Add a backup job."""
        self.jobs[job.name] = job
    
    def simulate_backup(self, job_name: str) -> Backup:
        """
        Simulate running a backup job.
        """
        if job_name not in self.jobs:
            raise ValueError(f"Job {job_name} not found")
        
        job = self.jobs[job_name]
        
        # Simulate backup creation
        timestamp = datetime.now()
        size_mb = random.uniform(10, 500)  # Simulated size
        checksum = f"sha256:{random.getrandbits(256):064x}"[:72]
        
        backup = Backup(
            job_name=job_name,
            timestamp=timestamp,
            path=f"{job.destination}/{job_name}_{timestamp.strftime('%Y%m%d_%H%M%S')}.backup",
            size_mb=size_mb,
            checksum=checksum,
            metadata={
                'source': job.source,
                'compression': 'gzip',
                'encrypted': True
            }
        )
        
        # Update job status
        job.last_run = timestamp
        job.last_status = 'success'
        job.last_size_mb = size_mb
        
        self.backups.append(backup)
        
        return backup
    
    def list_backups(self, job_name: str = None, days: int = 7) -> List[Backup]:
        """List recent backups."""
        cutoff = datetime.now() - timedelta(days=days)
        
        backups = [
            b for b in self.backups
            if b.timestamp > cutoff and (job_name is None or b.job_name == job_name)
        ]
        
        return sorted(backups, key=lambda b: b.timestamp, reverse=True)
    
    def get_backup_report(self) -> Dict:
        """Generate backup status report."""
        report = {
            'timestamp': datetime.now().isoformat(),
            'jobs': [],
            'total_backups': len(self.backups),
            'total_size_gb': sum(b.size_mb for b in self.backups) / 1024
        }
        
        for name, job in self.jobs.items():
            recent_backups = self.list_backups(name, days=7)
            
            job_report = {
                'name': name,
                'schedule': job.schedule,
                'last_run': job.last_run.isoformat() if job.last_run else 'Never',
                'last_status': job.last_status,
                'backups_last_7d': len(recent_backups),
                'retention_days': job.retention_days
            }
            
            report['jobs'].append(job_report)
        
        return report

# Create backup manager and jobs
backups = BackupManager()

# Add backup jobs
backups.add_job(BackupJob(
    name='trade_database',
    source='postgresql://localhost:5432/trading',
    destination='s3://backups/database',
    schedule='0 */4 * * *',  # Every 4 hours
    retention_days=90
))

backups.add_job(BackupJob(
    name='configuration',
    source='/etc/trading/',
    destination='s3://backups/config',
    schedule='On change',
    retention_days=365
))

# Simulate some backups
for job_name in backups.jobs.keys():
    for _ in range(3):
        backups.simulate_backup(job_name)

# Display report
report = backups.get_backup_report()

print("Backup Status Report")
print("=" * 60)
print(f"Total Backups: {report['total_backups']}")
print(f"Total Size: {report['total_size_gb']:.2f} GB")
print()

for job in report['jobs']:
    status_icon = "✓" if job['last_status'] == 'success' else "✗"
    print(f"{status_icon} {job['name']}")
    print(f"    Schedule: {job['schedule']}")
    print(f"    Last Run: {job['last_run']}")
    print(f"    Retention: {job['retention_days']} days")

Exercise 18.4: Complete Monitoring System (Open-ended)

Build a comprehensive monitoring system that tracks system health, collects metrics, and generates status reports.

Exercise
Click for solution
class MonitoringSystem:
    """
    Comprehensive system monitoring with metrics, health checks, and reporting.
    """

    def __init__(self, name: str, retention_hours: int = 24):
        self.name = name
        self.retention_hours = retention_hours
        self.metrics: Dict[str, List[Dict]] = {}
        self.health_checks: Dict[str, Callable] = {}
        self.health_results: Dict[str, Dict] = {}

    def record_metric(self, name: str, value: float, metric_type: str = 'gauge'):
        """Record a metric value."""
        if name not in self.metrics:
            self.metrics[name] = []

        self.metrics[name].append({
            'value': value,
            'timestamp': datetime.now(),
            'type': metric_type
        })

        # Cleanup old entries
        cutoff = datetime.now() - timedelta(hours=self.retention_hours)
        self.metrics[name] = [m for m in self.metrics[name] if m['timestamp'] > cutoff]

    def get_metric_stats(self, name: str, hours: float = 1.0) -> Dict:
        """Get statistics for a metric."""
        if name not in self.metrics:
            return {}

        cutoff = datetime.now() - timedelta(hours=hours)
        values = [m['value'] for m in self.metrics[name] if m['timestamp'] > cutoff]

        if not values:
            return {}

        return {
            'current': values[-1],
            'min': min(values),
            'max': max(values),
            'mean': sum(values) / len(values),
            'count': len(values)
        }

    def register_health_check(self, name: str, check_func: Callable):
        """Register a health check function."""
        self.health_checks[name] = check_func

    def run_health_checks(self) -> Dict:
        """Run all health checks."""
        self.health_results = {}

        for name, check_func in self.health_checks.items():
            try:
                result = check_func()
                self.health_results[name] = {
                    'healthy': result.get('healthy', True),
                    'message': result.get('message', 'OK'),
                    'timestamp': datetime.now().isoformat()
                }
            except Exception as e:
                self.health_results[name] = {
                    'healthy': False,
                    'message': str(e),
                    'timestamp': datetime.now().isoformat()
                }

        return self.health_results

    def is_healthy(self) -> bool:
        """Check if system is healthy."""
        if not self.health_results:
            self.run_health_checks()
        return all(r['healthy'] for r in self.health_results.values())

    def generate_report(self) -> str:
        """Generate status report."""
        self.run_health_checks()

        lines = [
            f"MONITORING REPORT: {self.name}",
            "=" * 50,
            f"Time: {datetime.now().isoformat()}",
            f"Overall Health: {'HEALTHY' if self.is_healthy() else 'UNHEALTHY'}",
            "",
            "HEALTH CHECKS:"
        ]

        for name, result in self.health_results.items():
            icon = "✓" if result['healthy'] else "✗"
            lines.append(f"  {icon} {name}: {result['message']}")

        lines.append("\nMETRICS:")
        for name in self.metrics.keys():
            stats = self.get_metric_stats(name)
            if stats:
                lines.append(f"  {name}: current={stats['current']:.1f}, mean={stats['mean']:.1f}")

        return "\n".join(lines)

# Test
monitor = MonitoringSystem("Trading System")

# Add health checks
monitor.register_health_check('database', lambda: {'healthy': True, 'message': 'Connected'})
monitor.register_health_check('api', lambda: {'healthy': True, 'message': 'Responding'})

# Record metrics
for _ in range(10):
    monitor.record_metric('cpu', random.uniform(30, 70))
    monitor.record_metric('memory', random.uniform(50, 80))

# Generate report
print(monitor.generate_report())

Exercise 18.5: Incident Management System (Open-ended)

Create a comprehensive incident management system with runbooks and post-mortem generation.

Exercise
Click for solution
class IncidentManagementSystem:
    """
    Comprehensive incident management with runbooks and reporting.
    """

    def __init__(self):
        self.incidents = {}
        self.runbooks = {}
        self._counter = 0

    def add_runbook(self, name: str, symptoms: List[str], steps: List[str],
                   escalation: str):
        """Add a runbook to the library."""
        self.runbooks[name] = {
            'symptoms': symptoms,
            'steps': steps,
            'escalation': escalation
        }

    def find_runbook(self, symptoms: List[str]) -> Optional[str]:
        """Find matching runbook for symptoms."""
        for name, runbook in self.runbooks.items():
            for symptom in symptoms:
                if any(symptom.lower() in rb_symptom.lower() 
                       for rb_symptom in runbook['symptoms']):
                    return name
        return None

    def create_incident(self, title: str, severity: str, description: str,
                       affected_systems: List[str] = None) -> str:
        """Create a new incident."""
        self._counter += 1
        incident_id = f"INC-{self._counter:04d}"

        self.incidents[incident_id] = {
            'title': title,
            'severity': severity,
            'description': description,
            'status': 'open',
            'affected_systems': affected_systems or [],
            'created_at': datetime.now(),
            'resolved_at': None,
            'timeline': [{
                'time': datetime.now().isoformat(),
                'event': 'Incident created',
                'details': description
            }],
            'root_cause': '',
            'resolution': ''
        }

        return incident_id

    def update_status(self, incident_id: str, status: str, notes: str = ""):
        """Update incident status."""
        if incident_id not in self.incidents:
            raise ValueError(f"Incident {incident_id} not found")

        incident = self.incidents[incident_id]
        old_status = incident['status']
        incident['status'] = status

        incident['timeline'].append({
            'time': datetime.now().isoformat(),
            'event': f'Status: {old_status} -> {status}',
            'details': notes
        })

        if status == 'resolved':
            incident['resolved_at'] = datetime.now()

    def add_note(self, incident_id: str, event: str, details: str = ""):
        """Add timeline note to incident."""
        if incident_id not in self.incidents:
            raise ValueError(f"Incident {incident_id} not found")

        self.incidents[incident_id]['timeline'].append({
            'time': datetime.now().isoformat(),
            'event': event,
            'details': details
        })

    def generate_postmortem(self, incident_id: str) -> str:
        """Generate post-mortem report."""
        if incident_id not in self.incidents:
            raise ValueError(f"Incident {incident_id} not found")

        inc = self.incidents[incident_id]

        duration = "Ongoing"
        if inc['resolved_at']:
            mins = (inc['resolved_at'] - inc['created_at']).total_seconds() / 60
            duration = f"{mins:.0f} minutes"

        lines = [
            f"POST-MORTEM: {incident_id}",
            "=" * 50,
            f"Title: {inc['title']}",
            f"Severity: {inc['severity']}",
            f"Duration: {duration}",
            f"Status: {inc['status']}",
            "",
            "TIMELINE:"
        ]

        for entry in inc['timeline']:
            lines.append(f"  {entry['time']}: {entry['event']}")
            if entry.get('details'):
                lines.append(f"    -> {entry['details']}")

        lines.extend([
            "",
            f"ROOT CAUSE: {inc['root_cause'] or 'TBD'}",
            f"RESOLUTION: {inc['resolution'] or 'TBD'}"
        ])

        return "\n".join(lines)

# Test
ims = IncidentManagementSystem()

# Add runbook
ims.add_runbook(
    'database_issues',
    symptoms=['database', 'connection', 'timeout'],
    steps=['Check server status', 'Verify connectivity', 'Restart if needed'],
    escalation='Page DBA'
)

# Create and manage incident
inc_id = ims.create_incident(
    title='Database Connection Issues',
    severity='SEV2',
    description='Multiple connection timeouts',
    affected_systems=['api', 'orders']
)

ims.update_status(inc_id, 'investigating', 'Engineer assigned')
ims.add_note(inc_id, 'Root cause identified', 'Connection pool exhausted')
ims.incidents[inc_id]['root_cause'] = 'Connection pool exhaustion'
ims.incidents[inc_id]['resolution'] = 'Restarted services'
ims.update_status(inc_id, 'resolved', 'Services restored')

print(ims.generate_postmortem(inc_id))

Exercise 18.6: Operations Dashboard (Open-ended)

Build a unified operations dashboard that combines monitoring, alerts, incidents, and backups into a single view.

Exercise
Click for solution
class OperationsDashboard:
    """
    Unified operations dashboard combining all monitoring components.
    """

    def __init__(self, name: str):
        self.name = name
        self.metrics = MetricsCollector()
        self.health = HealthChecker()
        self.alerts = AlertManager()
        self.backups = BackupManager()
        self.active_incidents = []

    def collect_metrics(self, metrics_data: Dict):
        """Collect system metrics."""
        for name, value in metrics_data.items():
            self.metrics.record(name, value)
        self.alerts.check_rules(metrics_data)

    def add_health_check(self, name: str, check_func: Callable):
        """Register a health check."""
        self.health.register_check(name, check_func)

    def add_backup_job(self, job: BackupJob):
        """Add a backup job."""
        self.backups.add_job(job)

    def get_overall_status(self) -> str:
        """Determine overall system status."""
        # Check for critical alerts
        alert_summary = self.alerts.get_alert_summary()
        if alert_summary['critical_count'] > 0:
            return 'critical'

        # Check health
        if not self.health.is_healthy():
            return 'warning'

        # Check for warnings
        if alert_summary['warning_count'] > 0:
            return 'warning'

        return 'healthy'

    def generate_dashboard(self) -> str:
        """Generate comprehensive dashboard display."""
        status = self.get_overall_status()
        status_emoji = {'healthy': '🟢', 'warning': '🟡', 'critical': '🔴'}

        lines = [
            "=" * 60,
            f"OPERATIONS DASHBOARD: {self.name}",
            f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
            "=" * 60,
            "",
            f"OVERALL STATUS: {status_emoji.get(status, '⚪')} {status.upper()}",
            ""
        ]

        # Health checks
        lines.append("HEALTH CHECKS")
        lines.append("-" * 40)
        health_status = self.health.get_status()
        lines.append(f"Status: {health_status['healthy_checks']}/{health_status['total_checks']} healthy")
        for name, result in health_status.get('checks', {}).items():
            icon = "✓" if result['healthy'] else "✗"
            lines.append(f"  {icon} {name}: {result['message']}")
        lines.append("")

        # Alerts
        lines.append("ALERTS")
        lines.append("-" * 40)
        alert_summary = self.alerts.get_alert_summary()
        lines.append(f"Active: {alert_summary['active_count']} (Critical: {alert_summary['critical_count']}, Warning: {alert_summary['warning_count']})")
        for alert in alert_summary.get('active_alerts', [])[:5]:
            lines.append(f"  [{alert['severity'].upper()}] {alert['name']}: {alert['message']}")
        lines.append("")

        # Key metrics
        lines.append("KEY METRICS")
        lines.append("-" * 40)
        for metric_name in ['cpu', 'memory', 'latency']:
            stats = self.metrics.get_stats(metric_name)
            if stats:
                lines.append(f"  {metric_name}: mean={stats['mean']:.1f}, p95={stats['p95']:.1f}")
        lines.append("")

        # Backups
        lines.append("BACKUPS")
        lines.append("-" * 40)
        backup_report = self.backups.get_backup_report()
        lines.append(f"Total: {backup_report['total_backups']} backups")
        for job in backup_report['jobs']:
            icon = "✓" if job['last_status'] == 'success' else "✗"
            lines.append(f"  {icon} {job['name']}: {job['last_status']}")

        lines.append("")
        lines.append("=" * 60)

        return "\n".join(lines)

# Test
dashboard = OperationsDashboard("Quant Trading System")

# Setup health checks
dashboard.add_health_check('database', lambda: {'healthy': True, 'message': 'Connected'})
dashboard.add_health_check('broker', lambda: {'healthy': True, 'message': 'API responding'})

# Setup alert rules
dashboard.alerts.add_rule(
    'high_cpu', lambda ctx: ctx.get('cpu', 0) > 80,
    AlertSeverity.WARNING, 'CPU at {cpu}%'
)

# Add backup job
dashboard.add_backup_job(BackupJob(
    name='database',
    source='postgres',
    destination='s3://backups',
    schedule='0 */4 * * *',
    retention_days=90
))
dashboard.backups.simulate_backup('database')

# Collect metrics
for _ in range(30):
    dashboard.collect_metrics({
        'cpu': random.uniform(30, 70),
        'memory': random.uniform(50, 80),
        'latency': random.exponential(50) + 30
    })

print(dashboard.generate_dashboard())

Module Project: Complete Operations System

Build a comprehensive operations system that brings together all monitoring, alerting, and incident management components.

class TradingOperationsSystem:
    """
    Complete operations system for trading infrastructure.
    
    Features:
    - Real-time metrics collection
    - Health monitoring
    - Alert management
    - Incident tracking
    - Backup management
    """
    
    def __init__(self, name: str = "Trading Operations"):
        self.name = name
        self.metrics = MetricsCollector()
        self.health = HealthChecker()
        self.alerts = AlertManager()
        self.incidents = IncidentManager()
        self.backups = BackupManager()
        
        self._setup_defaults()
    
    def _setup_defaults(self):
        """Setup default monitoring configuration."""
        # Default health checks
        self.health.register_check('database', lambda: {'healthy': True, 'message': 'Connected'})
        self.health.register_check('broker_api', lambda: {'healthy': True, 'message': 'Responding'})
        self.health.register_check('market_data', lambda: {'healthy': True, 'message': 'Streaming'})
        
        # Default alert rules
        self.alerts.add_rule(
            'high_cpu', lambda ctx: ctx.get('cpu', 0) > 80,
            AlertSeverity.WARNING, 'CPU at {cpu:.1f}%'
        )
        self.alerts.add_rule(
            'high_latency', lambda ctx: ctx.get('latency', 0) > 500,
            AlertSeverity.WARNING, 'Latency at {latency:.0f}ms'
        )
        self.alerts.add_rule(
            'large_drawdown', lambda ctx: ctx.get('drawdown', 0) > 0.10,
            AlertSeverity.CRITICAL, 'Drawdown at {drawdown:.1%}'
        )
        
        # Default backup jobs
        self.backups.add_job(BackupJob(
            name='trade_database',
            source='postgresql://localhost/trading',
            destination='s3://backups/db',
            schedule='0 */4 * * *',
            retention_days=90
        ))
    
    def collect_metrics(self, data: Dict):
        """Collect system metrics and check alerts."""
        for name, value in data.items():
            self.metrics.record(name, value)
        self.alerts.check_rules(data)
    
    def get_system_status(self) -> Dict:
        """Get comprehensive system status."""
        health_status = self.health.get_status()
        alert_summary = self.alerts.get_alert_summary()
        
        # Determine overall status
        if alert_summary['critical_count'] > 0:
            overall = 'critical'
        elif alert_summary['warning_count'] > 0 or health_status['status'] != 'healthy':
            overall = 'warning'
        else:
            overall = 'healthy'
        
        return {
            'timestamp': datetime.now().isoformat(),
            'overall_status': overall,
            'health': health_status,
            'alerts': alert_summary,
            'backup_status': self.backups.get_backup_report()
        }
    
    def generate_dashboard(self) -> str:
        """Generate text-based dashboard."""
        status = self.get_system_status()
        emoji = {'healthy': '🟢', 'warning': '🟡', 'critical': '🔴'}
        
        lines = [
            "=" * 60,
            f"OPERATIONS DASHBOARD: {self.name}",
            f"Time: {status['timestamp']}",
            "=" * 60,
            "",
            f"OVERALL: {emoji.get(status['overall_status'], '⚪')} {status['overall_status'].upper()}",
            "",
            "HEALTH CHECKS:"
        ]
        
        for name, result in status['health'].get('checks', {}).items():
            icon = "✓" if result['healthy'] else "✗"
            lines.append(f"  {icon} {name}: {result['message']}")
        
        lines.append("\nALERTS:")
        alerts = status['alerts']
        lines.append(f"  Active: {alerts['active_count']} (Critical: {alerts['critical_count']}, Warning: {alerts['warning_count']})")
        
        lines.append("\nBACKUPS:")
        for job in status['backup_status']['jobs']:
            icon = "✓" if job['last_status'] == 'success' else "✗"
            lines.append(f"  {icon} {job['name']}: {job['last_status']}")
        
        lines.append("\n" + "=" * 60)
        
        return "\n".join(lines)

# Create and test system
ops = TradingOperationsSystem("Quant Trading Platform")

# Simulate operations
for _ in range(30):
    ops.collect_metrics({
        'cpu': random.uniform(30, 70),
        'memory': random.uniform(50, 80),
        'latency': random.exponential(50) + 30,
        'drawdown': random.uniform(0, 0.05)
    })

# Run backup
ops.backups.simulate_backup('trade_database')

# Display dashboard
print(ops.generate_dashboard())

Key Takeaways

System Monitoring

  • Monitor the Four Golden Signals: latency, traffic, errors, saturation
  • Health checks should be fast and deterministic
  • Track both infrastructure and business metrics

Alerting

  • Every alert should be actionable
  • Use appropriate severity levels
  • Implement cooldowns to prevent alert storms
  • Route critical alerts to on-call systems

Incident Response

  • Have runbooks for common issues
  • Document everything in the incident timeline
  • Conduct post-mortems to prevent recurrence
  • Focus on "how to prevent" not "who to blame"

Backup & Recovery

  • Define RPO (Recovery Point Objective) and RTO (Recovery Time Objective)
  • Test restores regularly
  • Encrypt backups at rest and in transit
  • Automate backup verification

Best Practices

  1. Design for failure - everything will break eventually
  2. Automate repetitive operational tasks
  3. Keep runbooks up to date
  4. Practice incident response with game days
  5. Monitor your monitoring (meta-monitoring)

Congratulations on completing all modules! Now proceed to the Capstone Project to bring everything together.

Capstone Project: Complete Quantitative Trading System

Course 3: Quantitative Finance & Portfolio Theory

Duration ~4-5 hours
Exercises 6 (3 guided + 3 open-ended)

Project Overview

You will build a complete quantitative trading system that integrates:

  1. Data Pipeline - Market data collection and storage
  2. Strategy Engine - Multi-strategy portfolio management
  3. Risk Management - Real-time risk monitoring and limits
  4. Execution - Order management with cost awareness
  5. Dashboard & Reporting - Performance visualization and reports

Learning Integration

This project draws from every module in the course:

Component Modules Used
Portfolio Optimization 4, 5, 6
Risk Management 7, 8, 9
Simulation & Analysis 10, 11
Dashboard & Reporting 12, 13
Execution 14, 15
Infrastructure 16, 17, 18

# Environment setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Callable
from enum import Enum
from scipy.optimize import minimize
import json
import warnings
warnings.filterwarnings('ignore')

# Display settings
pd.set_option('display.float_format', lambda x: f'{x:.4f}')
np.set_printoptions(precision=4)

print("Capstone Project: Complete Quantitative Trading System")
print("=" * 55)

Part 1: Data Pipeline

Build a data pipeline that: - Fetches market data from multiple sources - Calculates derived metrics (returns, volatility, etc.) - Stores data efficiently

class DataPipeline:
    """
    Market data pipeline for the trading system.
    
    Responsibilities:
    - Fetch historical and live market data
    - Calculate returns and risk metrics
    - Provide data to other system components
    """
    
    def __init__(self, universe: List[str]):
        self.universe = universe
        self.prices = pd.DataFrame()
        self.returns = pd.DataFrame()
        self.metadata = {}
        self.last_update = None
    
    def fetch_historical_data(self, start_date: str, end_date: str = None) -> pd.DataFrame:
        """Fetch historical price data."""
        end_date = end_date or datetime.now().strftime('%Y-%m-%d')
        
        print(f"Fetching data for {len(self.universe)} symbols...")
        data = yf.download(self.universe, start=start_date, end=end_date, progress=False)
        
        # Handle MultiIndex columns
        if isinstance(data.columns, pd.MultiIndex):
            if 'Adj Close' in data.columns.get_level_values(0):
                self.prices = data['Adj Close']
            elif 'Close' in data.columns.get_level_values(0):
                self.prices = data['Close']
        else:
            self.prices = data
        
        # Ensure proper column order
        self.prices = self.prices[self.universe]
        
        # Calculate returns
        self.returns = self.prices.pct_change().dropna()
        
        self.last_update = datetime.now()
        print(f"Loaded {len(self.prices)} days of data")
        
        return self.prices
    
    def calculate_metrics(self, lookback_days: int = 252) -> Dict:
        """Calculate key metrics for all assets."""
        if self.returns.empty:
            raise ValueError("No data loaded. Call fetch_historical_data first.")
        
        recent_returns = self.returns.tail(lookback_days)
        
        metrics = {}
        for symbol in self.universe:
            ret = recent_returns[symbol]
            metrics[symbol] = {
                'annual_return': ret.mean() * 252,
                'annual_volatility': ret.std() * np.sqrt(252),
                'sharpe_ratio': (ret.mean() * 252) / (ret.std() * np.sqrt(252)),
                'max_drawdown': self._calculate_max_drawdown(self.prices[symbol].tail(lookback_days)),
                'current_price': self.prices[symbol].iloc[-1]
            }
        
        return metrics
    
    def _calculate_max_drawdown(self, prices: pd.Series) -> float:
        """Calculate maximum drawdown."""
        cummax = prices.cummax()
        drawdown = (prices - cummax) / cummax
        return drawdown.min()
    
    def get_correlation_matrix(self, lookback_days: int = 252) -> pd.DataFrame:
        """Get correlation matrix."""
        return self.returns.tail(lookback_days).corr()
    
    def get_covariance_matrix(self, lookback_days: int = 252, annualize: bool = True) -> pd.DataFrame:
        """Get covariance matrix."""
        cov = self.returns.tail(lookback_days).cov()
        if annualize:
            cov = cov * 252
        return cov
    
    def get_latest_prices(self) -> pd.Series:
        """Get most recent prices."""
        return self.prices.iloc[-1]

# Initialize data pipeline
UNIVERSE = ['SPY', 'QQQ', 'IWM', 'EFA', 'EEM', 'TLT', 'GLD', 'VNQ']

pipeline = DataPipeline(UNIVERSE)
pipeline.fetch_historical_data('2020-01-01')

# Display metrics
metrics = pipeline.calculate_metrics()

print("\nAsset Metrics:")
print("=" * 70)
metrics_df = pd.DataFrame(metrics).T
print(metrics_df.to_string())

Exercise C.1: Data Quality Validator (Guided)

Add data quality validation to the data pipeline to check for missing values and outliers.

# Exercise C.1: Data Quality Validator (Guided)

def validate_data_quality(prices: pd.DataFrame, returns: pd.DataFrame) -> Dict:
    """
    Validate data quality and identify issues.
    
    Args:
        prices: Price DataFrame
        returns: Returns DataFrame
    
    Returns:
        Validation results with quality metrics
    """
    results = {
        'valid': True,
        'issues': [],
        'metrics': {}
    }
    
    # Check for missing values in prices
    # TODO: Count missing values per column
    missing_counts = prices.______().______()
    results['metrics']['missing_values'] = missing_counts.to_dict()
    
    # TODO: Check if any column has missing values
    if missing_counts.______() > 0:
        results['issues'].append('Missing values detected')
    
    # Check for outliers in returns (>5 standard deviations)
    outlier_threshold = 5
    outlier_counts = {}
    
    for col in returns.columns:
        # TODO: Calculate mean and standard deviation
        mean = returns[col].______()
        std = returns[col].______()
        
        # TODO: Count outliers beyond threshold
        outliers = ((returns[col] - mean).______() > outlier_threshold * std).sum()
        outlier_counts[col] = ______
    
    results['metrics']['outlier_counts'] = outlier_counts
    
    # TODO: Check if total outliers exceed threshold
    total_outliers = ______(outlier_counts.values())
    if total_outliers > len(returns) * 0.01:  # More than 1% outliers
        results['issues'].______(f'High outlier count: {total_outliers}')
    
    # Check data coverage
    # TODO: Calculate trading days
    trading_days = ______(prices)
    results['metrics']['trading_days'] = trading_days
    
    # Set overall validity
    results['valid'] = len(results['issues']) == 0
    
    return results

# Test
validation = validate_data_quality(pipeline.prices, pipeline.returns)
print(f"Data Valid: {validation['valid']}")
print(f"Trading Days: {validation['metrics']['trading_days']}")
print(f"Issues: {validation['issues']}")
Click for solution
def validate_data_quality(prices: pd.DataFrame, returns: pd.DataFrame) -> Dict:
    """
    Validate data quality and identify issues.

    Args:
        prices: Price DataFrame
        returns: Returns DataFrame

    Returns:
        Validation results with quality metrics
    """
    results = {
        'valid': True,
        'issues': [],
        'metrics': {}
    }

    missing_counts = prices.isna().sum()
    results['metrics']['missing_values'] = missing_counts.to_dict()

    if missing_counts.sum() > 0:
        results['issues'].append('Missing values detected')

    outlier_threshold = 5
    outlier_counts = {}

    for col in returns.columns:
        mean = returns[col].mean()
        std = returns[col].std()

        outliers = ((returns[col] - mean).abs() > outlier_threshold * std).sum()
        outlier_counts[col] = outliers

    results['metrics']['outlier_counts'] = outlier_counts

    total_outliers = sum(outlier_counts.values())
    if total_outliers > len(returns) * 0.01:
        results['issues'].append(f'High outlier count: {total_outliers}')

    trading_days = len(prices)
    results['metrics']['trading_days'] = trading_days

    results['valid'] = len(results['issues']) == 0

    return results

Part 2: Strategy Engine

Build a multi-strategy engine that: - Implements multiple portfolio optimization strategies - Manages strategy weights and allocation - Generates trading signals

class Strategy:
    """Base class for trading strategies."""
    
    def __init__(self, name: str):
        self.name = name
    
    def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
        """Calculate target weights. Override in subclass."""
        raise NotImplementedError

class MeanVarianceStrategy(Strategy):
    """Mean-Variance Optimization Strategy."""
    
    def __init__(self, target_return: float = 0.10):
        super().__init__("Mean-Variance")
        self.target_return = target_return
    
    def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
        returns = data.returns
        n_assets = len(data.universe)
        
        mu = returns.mean().values * 252
        cov = returns.cov().values * 252
        
        def objective(w):
            return w @ cov @ w
        
        constraints = [
            {'type': 'eq', 'fun': lambda w: np.sum(w) - 1},
            {'type': 'ineq', 'fun': lambda w: w @ mu - self.target_return}
        ]
        bounds = [(0, 0.3) for _ in range(n_assets)]
        
        result = minimize(objective, np.ones(n_assets)/n_assets,
                         method='SLSQP', bounds=bounds, constraints=constraints)
        
        return dict(zip(data.universe, result.x))

class RiskParityStrategy(Strategy):
    """Risk Parity Strategy."""
    
    def __init__(self):
        super().__init__("Risk Parity")
    
    def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
        returns = data.returns
        n_assets = len(data.universe)
        
        cov = returns.cov().values * 252
        target_risk = 1 / n_assets
        
        def objective(w):
            port_vol = np.sqrt(w @ cov @ w)
            marginal_contrib = cov @ w
            risk_contrib = w * marginal_contrib / port_vol
            return np.sum((risk_contrib - target_risk)**2)
        
        constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
        bounds = [(0.01, 0.5) for _ in range(n_assets)]
        
        result = minimize(objective, np.ones(n_assets)/n_assets,
                         method='SLSQP', bounds=bounds, constraints=constraints)
        
        return dict(zip(data.universe, result.x))

class MinimumVolatilityStrategy(Strategy):
    """Minimum Volatility Strategy."""
    
    def __init__(self):
        super().__init__("Minimum Volatility")
    
    def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
        returns = data.returns
        n_assets = len(data.universe)
        
        cov = returns.cov().values * 252
        
        def objective(w):
            return np.sqrt(w @ cov @ w)
        
        constraints = [{'type': 'eq', 'fun': lambda w: np.sum(w) - 1}]
        bounds = [(0, 0.4) for _ in range(n_assets)]
        
        result = minimize(objective, np.ones(n_assets)/n_assets,
                         method='SLSQP', bounds=bounds, constraints=constraints)
        
        return dict(zip(data.universe, result.x))


class StrategyEngine:
    """
    Multi-strategy portfolio engine.
    """
    
    def __init__(self, data_pipeline: DataPipeline):
        self.data = data_pipeline
        self.strategies: Dict[str, Strategy] = {}
        self.strategy_allocations: Dict[str, float] = {}
        self.combined_weights: Dict[str, float] = {}
    
    def add_strategy(self, strategy: Strategy, allocation: float):
        """Add a strategy with given allocation."""
        self.strategies[strategy.name] = strategy
        self.strategy_allocations[strategy.name] = allocation
    
    def calculate_all_weights(self) -> Dict[str, Dict[str, float]]:
        """Calculate weights for all strategies."""
        all_weights = {}
        
        for name, strategy in self.strategies.items():
            try:
                weights = strategy.calculate_weights(self.data)
                all_weights[name] = weights
            except Exception as e:
                print(f"Error calculating {name} weights: {e}")
                n = len(self.data.universe)
                all_weights[name] = {s: 1/n for s in self.data.universe}
        
        return all_weights
    
    def combine_weights(self) -> Dict[str, float]:
        """Combine strategy weights based on allocations."""
        all_weights = self.calculate_all_weights()
        
        combined = {symbol: 0.0 for symbol in self.data.universe}
        
        for strategy_name, strategy_weights in all_weights.items():
            allocation = self.strategy_allocations.get(strategy_name, 0)
            for symbol, weight in strategy_weights.items():
                combined[symbol] += weight * allocation
        
        total = sum(combined.values())
        if total > 0:
            combined = {s: w/total for s, w in combined.items()}
        
        self.combined_weights = combined
        return combined
    
    def get_strategy_comparison(self) -> pd.DataFrame:
        """Get comparison of all strategy weights."""
        all_weights = self.calculate_all_weights()
        all_weights['Combined'] = self.combine_weights()
        
        return pd.DataFrame(all_weights)

# Create strategy engine
engine = StrategyEngine(pipeline)

# Add strategies with allocations
engine.add_strategy(MeanVarianceStrategy(target_return=0.08), 0.40)
engine.add_strategy(RiskParityStrategy(), 0.35)
engine.add_strategy(MinimumVolatilityStrategy(), 0.25)

# Calculate and display weights
comparison = engine.get_strategy_comparison()

print("\nStrategy Weight Comparison:")
print("=" * 60)
print(comparison.round(3).to_string())

Exercise C.2: Strategy Performance Tracker (Guided)

Build a function that tracks and compares historical performance of each strategy.

# Exercise C.2: Strategy Performance Tracker (Guided)

def calculate_strategy_performance(returns: pd.DataFrame, 
                                   strategy_weights: Dict[str, Dict[str, float]]) -> Dict:
    """
    Calculate historical performance for each strategy.
    
    Args:
        returns: Asset returns DataFrame
        strategy_weights: Dict of {strategy_name: {symbol: weight}}
    
    Returns:
        Performance metrics for each strategy
    """
    performance = {}
    
    for strategy_name, weights in strategy_weights.items():
        # TODO: Convert weights to array in same order as returns columns
        weight_array = np.array([weights.______(col, 0) for col in returns.______])
        
        # TODO: Calculate portfolio returns
        port_returns = (returns * ______).sum(axis=1)
        
        # TODO: Calculate cumulative returns
        cumulative = (1 + port_returns).______()
        
        # Calculate drawdown
        running_max = cumulative.cummax()
        drawdown = (cumulative - running_max) / running_max
        
        # TODO: Calculate annual return
        annual_return = port_returns.______() * 252
        # TODO: Calculate annual volatility
        annual_vol = port_returns.______() * np.sqrt(252)
        
        performance[strategy_name] = {
            # TODO: Calculate total return
            'total_return': cumulative.______[-1] - 1,
            'annual_return': annual_return,
            'annual_volatility': annual_vol,
            # TODO: Calculate Sharpe ratio
            'sharpe_ratio': annual_return / ______ if annual_vol > 0 else 0,
            'max_drawdown': drawdown.min()
        }
    
    return performance

# Test
strategy_weights = engine.calculate_all_weights()
perf = calculate_strategy_performance(pipeline.returns, strategy_weights)

print("Strategy Performance:")
print("=" * 60)
for name, metrics in perf.items():
    print(f"\n{name}:")
    print(f"  Total Return: {metrics['total_return']:.2%}")
    print(f"  Sharpe Ratio: {metrics['sharpe_ratio']:.2f}")
    print(f"  Max Drawdown: {metrics['max_drawdown']:.2%}")
Click for solution
def calculate_strategy_performance(returns: pd.DataFrame, 
                                   strategy_weights: Dict[str, Dict[str, float]]) -> Dict:
    """
    Calculate historical performance for each strategy.

    Args:
        returns: Asset returns DataFrame
        strategy_weights: Dict of {strategy_name: {symbol: weight}}

    Returns:
        Performance metrics for each strategy
    """
    performance = {}

    for strategy_name, weights in strategy_weights.items():
        weight_array = np.array([weights.get(col, 0) for col in returns.columns])

        port_returns = (returns * weight_array).sum(axis=1)

        cumulative = (1 + port_returns).cumprod()

        running_max = cumulative.cummax()
        drawdown = (cumulative - running_max) / running_max

        annual_return = port_returns.mean() * 252
        annual_vol = port_returns.std() * np.sqrt(252)

        performance[strategy_name] = {
            'total_return': cumulative.iloc[-1] - 1,
            'annual_return': annual_return,
            'annual_volatility': annual_vol,
            'sharpe_ratio': annual_return / annual_vol if annual_vol > 0 else 0,
            'max_drawdown': drawdown.min()
        }

    return performance

Part 3: Risk Management

Build a risk management system that: - Monitors portfolio risk in real-time - Enforces risk limits - Calculates VaR and other risk metrics

@dataclass
class RiskLimits:
    """Risk limit configuration."""
    max_position_size: float = 0.25
    max_sector_exposure: float = 0.40
    max_portfolio_var: float = 0.02
    max_drawdown: float = 0.15
    min_cash: float = 0.05

class RiskManager:
    """
    Portfolio risk management system.
    """
    
    def __init__(self, data_pipeline: DataPipeline, limits: RiskLimits = None):
        self.data = data_pipeline
        self.limits = limits or RiskLimits()
        self.alerts = []
    
    def calculate_portfolio_var(self, weights: Dict[str, float], 
                                confidence: float = 0.95,
                                method: str = 'historical') -> float:
        """Calculate portfolio Value at Risk."""
        weight_array = np.array([weights.get(s, 0) for s in self.data.universe])
        port_returns = (self.data.returns * weight_array).sum(axis=1)
        
        if method == 'historical':
            var = -np.percentile(port_returns, (1 - confidence) * 100)
        elif method == 'parametric':
            from scipy.stats import norm
            mu = port_returns.mean()
            sigma = port_returns.std()
            var = -(mu + norm.ppf(1 - confidence) * sigma)
        else:
            raise ValueError(f"Unknown method: {method}")
        
        return var
    
    def calculate_portfolio_cvar(self, weights: Dict[str, float],
                                  confidence: float = 0.95) -> float:
        """Calculate Conditional Value at Risk."""
        weight_array = np.array([weights.get(s, 0) for s in self.data.universe])
        port_returns = (self.data.returns * weight_array).sum(axis=1)
        
        var = self.calculate_portfolio_var(weights, confidence)
        cvar = -port_returns[port_returns <= -var].mean()
        
        return cvar
    
    def calculate_risk_metrics(self, weights: Dict[str, float]) -> Dict:
        """Calculate comprehensive risk metrics."""
        weight_array = np.array([weights.get(s, 0) for s in self.data.universe])
        port_returns = (self.data.returns * weight_array).sum(axis=1)
        
        cov = self.data.get_covariance_matrix()
        port_vol = np.sqrt(weight_array @ cov.values @ weight_array)
        
        port_return = (self.data.returns.mean() * weight_array).sum() * 252
        
        cumulative = (1 + port_returns).cumprod()
        running_max = cumulative.cummax()
        drawdown = (cumulative - running_max) / running_max
        max_drawdown = drawdown.min()
        
        return {
            'annual_return': port_return,
            'annual_volatility': port_vol,
            'sharpe_ratio': port_return / port_vol if port_vol > 0 else 0,
            'var_95': self.calculate_portfolio_var(weights, 0.95),
            'cvar_95': self.calculate_portfolio_cvar(weights, 0.95),
            'max_drawdown': max_drawdown,
            'current_drawdown': drawdown.iloc[-1]
        }
    
    def check_limits(self, weights: Dict[str, float]) -> List[Dict]:
        """Check if weights violate any risk limits."""
        violations = []
        
        for symbol, weight in weights.items():
            if weight > self.limits.max_position_size:
                violations.append({
                    'type': 'position_size',
                    'symbol': symbol,
                    'value': weight,
                    'limit': self.limits.max_position_size,
                    'message': f"{symbol} weight {weight:.1%} exceeds limit {self.limits.max_position_size:.1%}"
                })
        
        var = self.calculate_portfolio_var(weights, 0.95)
        if var > self.limits.max_portfolio_var:
            violations.append({
                'type': 'var',
                'value': var,
                'limit': self.limits.max_portfolio_var,
                'message': f"Portfolio VaR {var:.2%} exceeds limit {self.limits.max_portfolio_var:.2%}"
            })
        
        self.alerts = violations
        return violations
    
    def get_risk_report(self, weights: Dict[str, float]) -> str:
        """Generate risk report."""
        metrics = self.calculate_risk_metrics(weights)
        violations = self.check_limits(weights)
        
        report = []
        report.append("="*50)
        report.append("RISK MANAGEMENT REPORT")
        report.append("="*50)
        report.append("")
        report.append("Portfolio Risk Metrics:")
        report.append("-"*30)
        report.append(f"  Expected Return:  {metrics['annual_return']:.2%}")
        report.append(f"  Volatility:       {metrics['annual_volatility']:.2%}")
        report.append(f"  Sharpe Ratio:     {metrics['sharpe_ratio']:.2f}")
        report.append(f"  VaR (95%):        {metrics['var_95']:.2%}")
        report.append(f"  Max Drawdown:     {metrics['max_drawdown']:.2%}")
        report.append("")
        report.append("Limit Violations:")
        report.append("-"*30)
        if violations:
            for v in violations:
                report.append(f"  ⚠️ {v['message']}")
        else:
            report.append("  ✓ All limits within bounds")
        report.append("")
        report.append("="*50)
        
        return "\n".join(report)

# Create risk manager
risk_manager = RiskManager(pipeline)

# Get combined weights
combined_weights = engine.combine_weights()

# Generate risk report
print(risk_manager.get_risk_report(combined_weights))

Exercise C.3: Dynamic Risk Adjustment (Guided)

Create a function that adjusts portfolio weights to respect risk limits.

# Exercise C.3: Dynamic Risk Adjustment (Guided)

def adjust_weights_for_risk(weights: Dict[str, float], 
                           risk_manager: RiskManager,
                           max_iterations: int = 10) -> Dict:
    """
    Adjust weights to meet risk limits.
    
    Args:
        weights: Target portfolio weights
        risk_manager: RiskManager instance
        max_iterations: Max adjustment iterations
    
    Returns:
        Adjusted weights and adjustment info
    """
    # TODO: Create copy of weights
    adjusted = weights.______()
    adjustments_made = []
    
    for iteration in range(max_iterations):
        # TODO: Check current limits
        violations = risk_manager.______(adjusted)
        
        # TODO: Exit if no violations
        if not ______:
            break
        
        for violation in violations:
            if violation['type'] == 'position_size':
                symbol = violation['symbol']
                # TODO: Get limit from violation
                limit = violation[______]
                
                # Calculate excess
                excess = adjusted[symbol] - limit
                # TODO: Cap at limit
                adjusted[symbol] = ______
                
                # Redistribute excess to other positions
                other_symbols = [s for s in adjusted if s != symbol]
                # TODO: Calculate redistribution per symbol
                per_symbol = excess / ______(other_symbols)
                
                for s in other_symbols:
                    adjusted[s] += per_symbol
                
                adjustments_made.append(f"Capped {symbol} at {limit:.1%}")
            
            elif violation['type'] == 'var':
                # Scale down all positions
                scale_factor = 0.9
                for symbol in adjusted:
                    # TODO: Scale down each weight
                    adjusted[symbol] ______ scale_factor
                
                adjustments_made.append(f"Scaled down by {1-scale_factor:.1%}")
    
    # Normalize
    total = sum(adjusted.values())
    if total > 0:
        adjusted = {s: w/total for s, w in adjusted.items()}
    
    return {
        'original_weights': weights,
        'adjusted_weights': adjusted,
        'adjustments': adjustments_made,
        'iterations': iteration + 1
    }

# Test
result = adjust_weights_for_risk(combined_weights, risk_manager)
print(f"Adjustments made: {len(result['adjustments'])}")
print(f"Iterations: {result['iterations']}")
if result['adjustments']:
    for adj in result['adjustments']:
        print(f"  - {adj}")
Click for solution
def adjust_weights_for_risk(weights: Dict[str, float], 
                           risk_manager: RiskManager,
                           max_iterations: int = 10) -> Dict:
    """
    Adjust weights to meet risk limits.

    Args:
        weights: Target portfolio weights
        risk_manager: RiskManager instance
        max_iterations: Max adjustment iterations

    Returns:
        Adjusted weights and adjustment info
    """
    adjusted = weights.copy()
    adjustments_made = []

    for iteration in range(max_iterations):
        violations = risk_manager.check_limits(adjusted)

        if not violations:
            break

        for violation in violations:
            if violation['type'] == 'position_size':
                symbol = violation['symbol']
                limit = violation['limit']

                excess = adjusted[symbol] - limit
                adjusted[symbol] = limit

                other_symbols = [s for s in adjusted if s != symbol]
                per_symbol = excess / len(other_symbols)

                for s in other_symbols:
                    adjusted[s] += per_symbol

                adjustments_made.append(f"Capped {symbol} at {limit:.1%}")

            elif violation['type'] == 'var':
                scale_factor = 0.9
                for symbol in adjusted:
                    adjusted[symbol] *= scale_factor

                adjustments_made.append(f"Scaled down by {1-scale_factor:.1%}")

    total = sum(adjusted.values())
    if total > 0:
        adjusted = {s: w/total for s, w in adjusted.items()}

    return {
        'original_weights': weights,
        'adjusted_weights': adjusted,
        'adjustments': adjustments_made,
        'iterations': iteration + 1
    }

Part 4: Execution Engine

Build an execution system that: - Calculates required trades to reach target weights - Estimates transaction costs - Manages order generation

@dataclass
class Trade:
    """Represents a trade order."""
    symbol: str
    side: str
    quantity: int
    price: float
    value: float
    reason: str = ""

class ExecutionEngine:
    """
    Trade execution engine.
    """
    
    def __init__(self, data_pipeline: DataPipeline, 
                 transaction_cost_bps: float = 10,
                 min_trade_value: float = 1000):
        self.data = data_pipeline
        self.transaction_cost_bps = transaction_cost_bps
        self.min_trade_value = min_trade_value
        self.pending_trades: List[Trade] = []
    
    def calculate_trades(self, current_holdings: Dict[str, int],
                        target_weights: Dict[str, float],
                        portfolio_value: float) -> List[Trade]:
        """Calculate trades to reach target weights."""
        prices = self.data.get_latest_prices()
        trades = []
        
        for symbol in target_weights:
            current_shares = current_holdings.get(symbol, 0)
            current_value = current_shares * prices[symbol]
            current_weight = current_value / portfolio_value if portfolio_value > 0 else 0
            
            target_weight = target_weights.get(symbol, 0)
            target_value = target_weight * portfolio_value
            target_shares = int(target_value / prices[symbol])
            
            shares_diff = target_shares - current_shares
            trade_value = abs(shares_diff * prices[symbol])
            
            if trade_value >= self.min_trade_value:
                trade = Trade(
                    symbol=symbol,
                    side='buy' if shares_diff > 0 else 'sell',
                    quantity=abs(shares_diff),
                    price=prices[symbol],
                    value=trade_value,
                    reason=f"Rebalance: {current_weight:.1%} -> {target_weight:.1%}"
                )
                trades.append(trade)
        
        trades.sort(key=lambda t: (t.side == 'buy', -t.value))
        self.pending_trades = trades
        return trades
    
    def estimate_costs(self, trades: List[Trade] = None) -> Dict:
        """Estimate transaction costs."""
        trades = trades or self.pending_trades
        
        total_value = sum(t.value for t in trades)
        total_cost = total_value * (self.transaction_cost_bps / 10000)
        
        return {
            'num_trades': len(trades),
            'total_value': total_value,
            'estimated_cost': total_cost,
            'cost_bps': total_cost / total_value * 10000 if total_value > 0 else 0
        }
    
    def get_trade_summary(self) -> pd.DataFrame:
        """Get summary of pending trades."""
        if not self.pending_trades:
            return pd.DataFrame()
        
        return pd.DataFrame([
            {
                'Symbol': t.symbol,
                'Side': t.side.upper(),
                'Quantity': t.quantity,
                'Price': t.price,
                'Value': t.value
            }
            for t in self.pending_trades
        ])

# Create execution engine
execution = ExecutionEngine(pipeline, transaction_cost_bps=10)

# Simulate current holdings
portfolio_value = 1_000_000
prices = pipeline.get_latest_prices()

initial_weight = 1 / len(UNIVERSE)
current_holdings = {}
for symbol in UNIVERSE:
    target_value = portfolio_value * initial_weight
    current_holdings[symbol] = int(target_value / prices[symbol])

# Calculate trades
trades = execution.calculate_trades(current_holdings, combined_weights, portfolio_value)

print("\nTrade Summary:")
print("=" * 60)
print(execution.get_trade_summary().to_string(index=False))

print("\nCost Estimate:")
costs = execution.estimate_costs()
print(f"  Total value: ${costs['total_value']:,.0f}")
print(f"  Estimated cost: ${costs['estimated_cost']:,.2f}")

Exercise C.4: Complete Trading System (Open-ended)

Integrate all components into a unified trading system class.

# Exercise C.4: Complete Trading System (Open-ended)
#
# Build a TradingSystem class that:
# - Integrates data pipeline, strategy engine, risk manager, and execution engine
# - Implements a run_cycle() method that:
#   1. Updates market data
#   2. Calculates target weights from strategies
#   3. Checks risk limits and adjusts if needed
#   4. Generates trades and estimates costs
# - Tracks portfolio state (holdings, value, PnL)
# - Generates comprehensive reports
#
# Your implementation:
Click for solution
class TradingSystem:
    """
    Complete quantitative trading system.
    """

    def __init__(self, name: str, universe: List[str]):
        self.name = name
        self.universe = universe

        # Components
        self.data = DataPipeline(universe)
        self.strategies = StrategyEngine(self.data)
        self.risk = RiskManager(self.data)
        self.execution = ExecutionEngine(self.data)

        # State
        self.holdings: Dict[str, int] = {}
        self.portfolio_value = 0
        self.target_weights: Dict[str, float] = {}
        self.is_initialized = False

    def initialize(self, start_date: str, initial_capital: float):
        """Initialize the trading system."""
        print(f"Initializing {self.name}...")

        self.data.fetch_historical_data(start_date)
        self.portfolio_value = initial_capital
        self.holdings = {symbol: 0 for symbol in self.universe}
        self.is_initialized = True
        print(f"System initialized with ${initial_capital:,.0f}")

    def add_strategy(self, strategy: Strategy, allocation: float):
        """Add a strategy to the system."""
        self.strategies.add_strategy(strategy, allocation)

    def run_cycle(self) -> Dict:
        """Run one trading cycle."""
        if not self.is_initialized:
            raise RuntimeError("System not initialized")

        result = {
            'timestamp': datetime.now().isoformat(),
            'status': 'success',
            'actions': []
        }

        # Calculate target weights
        self.target_weights = self.strategies.combine_weights()
        result['target_weights'] = self.target_weights.copy()
        result['actions'].append('Calculated target weights')

        # Check risk limits
        violations = self.risk.check_limits(self.target_weights)
        result['risk_violations'] = len(violations)

        if violations:
            result['actions'].append(f'Found {len(violations)} risk violations')
            result['status'] = 'risk_alert'
        else:
            result['actions'].append('Risk limits OK')

        # Calculate trades
        trades = self.execution.calculate_trades(
            self.holdings,
            self.target_weights,
            self.portfolio_value
        )
        result['num_trades'] = len(trades)
        result['actions'].append(f'Generated {len(trades)} trades')

        # Estimate costs
        result['estimated_costs'] = self.execution.estimate_costs()

        # Risk metrics
        result['risk_metrics'] = self.risk.calculate_risk_metrics(self.target_weights)

        return result

    def generate_report(self) -> str:
        """Generate comprehensive system report."""
        report = []
        report.append("="*60)
        report.append(f"TRADING SYSTEM REPORT: {self.name}")
        report.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        report.append("="*60)
        report.append("")

        report.append("PORTFOLIO SUMMARY")
        report.append("-"*40)
        report.append(f"Total Value: ${self.portfolio_value:,.0f}")
        report.append(f"Universe: {len(self.universe)} assets")
        report.append(f"Strategies: {len(self.strategies.strategies)}")
        report.append("")

        report.append("TARGET ALLOCATION")
        report.append("-"*40)
        for symbol, weight in sorted(self.target_weights.items(), key=lambda x: -x[1]):
            report.append(f"  {symbol}: {weight:.1%}")
        report.append("")

        if self.target_weights:
            metrics = self.risk.calculate_risk_metrics(self.target_weights)
            report.append("RISK METRICS")
            report.append("-"*40)
            report.append(f"  Expected Return: {metrics['annual_return']:.2%}")
            report.append(f"  Volatility: {metrics['annual_volatility']:.2%}")
            report.append(f"  Sharpe Ratio: {metrics['sharpe_ratio']:.2f}")
            report.append(f"  VaR (95%): {metrics['var_95']:.2%}")

        report.append("")
        report.append("="*60)

        return "\n".join(report)

# Test
system = TradingSystem("Capstone System", UNIVERSE)
system.initialize('2020-01-01', 1_000_000)
system.add_strategy(MeanVarianceStrategy(target_return=0.08), 0.40)
system.add_strategy(RiskParityStrategy(), 0.35)
system.add_strategy(MinimumVolatilityStrategy(), 0.25)

result = system.run_cycle()
print(f"Status: {result['status']}")
print(f"Trades: {result['num_trades']}")
print(system.generate_report())

Exercise C.5: Performance Dashboard (Open-ended)

Create a performance visualization dashboard for the trading system.

# Exercise C.5: Performance Dashboard (Open-ended)
#
# Build a function that creates a 4-panel visualization:
# - Panel 1: Cumulative returns (portfolio vs benchmark)
# - Panel 2: Drawdown chart
# - Panel 3: Portfolio allocation pie chart
# - Panel 4: Rolling Sharpe ratio
#
# The function should also print summary statistics:
# - Total return, alpha, max drawdown, Sharpe ratio
#
# Your implementation:
Click for solution
def create_performance_dashboard(returns: pd.DataFrame, 
                                 weights: Dict[str, float],
                                 benchmark_symbol: str = 'SPY'):
    """
    Create a comprehensive performance dashboard.

    Args:
        returns: Asset returns DataFrame
        weights: Portfolio weights
        benchmark_symbol: Benchmark ticker
    """
    # Calculate portfolio returns
    weight_array = np.array([weights.get(s, 0) for s in returns.columns])
    port_returns = (returns * weight_array).sum(axis=1)
    port_cumulative = (1 + port_returns).cumprod()

    # Benchmark returns
    bench_returns = returns[benchmark_symbol]
    bench_cumulative = (1 + bench_returns).cumprod()

    # Drawdown
    running_max = port_cumulative.cummax()
    drawdown = (port_cumulative - running_max) / running_max * 100

    # Rolling Sharpe
    rolling_sharpe = port_returns.rolling(63).mean() / port_returns.rolling(63).std() * np.sqrt(252)

    # Create figure
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # Panel 1: Cumulative returns
    axes[0, 0].plot(port_cumulative.index, port_cumulative, label='Portfolio', linewidth=2)
    axes[0, 0].plot(bench_cumulative.index, bench_cumulative, label=benchmark_symbol, linewidth=2, alpha=0.7)
    axes[0, 0].set_title('Cumulative Returns')
    axes[0, 0].set_ylabel('Growth of $1')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)

    # Panel 2: Drawdown
    axes[0, 1].fill_between(drawdown.index, 0, drawdown, alpha=0.7, color='red')
    axes[0, 1].set_title('Drawdown')
    axes[0, 1].set_ylabel('Drawdown (%)')
    axes[0, 1].grid(True, alpha=0.3)

    # Panel 3: Allocation
    sorted_weights = dict(sorted(weights.items(), key=lambda x: -x[1]))
    colors = plt.cm.Set3(np.linspace(0, 1, len(sorted_weights)))
    axes[1, 0].pie(sorted_weights.values(), labels=sorted_weights.keys(), 
                   autopct='%1.1f%%', colors=colors)
    axes[1, 0].set_title('Target Allocation')

    # Panel 4: Rolling Sharpe
    axes[1, 1].plot(rolling_sharpe.index, rolling_sharpe, linewidth=1)
    axes[1, 1].axhline(rolling_sharpe.mean(), color='red', linestyle='--', 
                       label=f'Mean: {rolling_sharpe.mean():.2f}')
    axes[1, 1].set_title('Rolling Sharpe Ratio (63-day)')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    # Print summary
    print("\nPerformance Summary:")
    print("="*50)
    print(f"Total Return: {(port_cumulative.iloc[-1] - 1):.2%}")
    print(f"Benchmark Return: {(bench_cumulative.iloc[-1] - 1):.2%}")
    print(f"Alpha: {(port_cumulative.iloc[-1] - bench_cumulative.iloc[-1]):.2%}")
    print(f"Max Drawdown: {drawdown.min():.2%}")
    print(f"Sharpe Ratio: {port_returns.mean() / port_returns.std() * np.sqrt(252):.2f}")

# Test
create_performance_dashboard(pipeline.returns, combined_weights)

Exercise C.6: System Extensions (Open-ended)

Extend the trading system with additional features.

# Exercise C.6: System Extensions (Open-ended)
#
# Choose ONE of the following extensions to implement:
#
# Option A: Momentum Strategy
# - Implement a momentum strategy that ranks assets by recent performance
# - Weight assets based on momentum score
# - Add to the strategy engine
#
# Option B: Portfolio Rebalancing Logic
# - Implement a rebalancing trigger (drift-based or calendar-based)
# - Only generate trades when rebalancing is triggered
# - Track rebalancing history
#
# Option C: Performance Attribution
# - Implement Brinson attribution (allocation + selection effects)
# - Break down performance by asset contribution
# - Generate attribution reports
#
# Your implementation:
Click for solution (Option A: Momentum Strategy)
class MomentumStrategy(Strategy):
    """
    Momentum-based strategy.

    Ranks assets by recent performance and overweights winners.
    """

    def __init__(self, lookback_days: int = 252, top_n: int = 4):
        super().__init__("Momentum")
        self.lookback_days = lookback_days
        self.top_n = top_n

    def calculate_weights(self, data: DataPipeline) -> Dict[str, float]:
        # Calculate momentum scores (total return over lookback)
        recent_prices = data.prices.tail(self.lookback_days)
        momentum = (recent_prices.iloc[-1] / recent_prices.iloc[0]) - 1

        # Rank and select top performers
        rankings = momentum.rank(ascending=False)

        # Weight based on momentum score
        weights = {}
        total_momentum = 0

        for symbol in data.universe:
            if rankings[symbol] <= self.top_n:
                # Only invest in top N performers
                score = max(momentum[symbol], 0.01)  # Floor at small positive
                weights[symbol] = score
                total_momentum += score
            else:
                weights[symbol] = 0

        # Normalize weights
        if total_momentum > 0:
            weights = {s: w / total_momentum for s, w in weights.items()}
        else:
            # Equal weight fallback
            n = len(data.universe)
            weights = {s: 1/n for s in data.universe}

        return weights

# Test the momentum strategy
momentum_strat = MomentumStrategy(lookback_days=126, top_n=4)
momentum_weights = momentum_strat.calculate_weights(pipeline)

print("Momentum Strategy Weights:")
print("="*40)
for symbol, weight in sorted(momentum_weights.items(), key=lambda x: -x[1]):
    if weight > 0:
        print(f"  {symbol}: {weight:.1%}")

# Add to strategy engine
engine.add_strategy(momentum_strat, 0.20)  # 20% allocation

Capstone Completion Checklist

Core Components

  • [ ] Data pipeline fetches and processes market data
  • [ ] Multiple strategies implemented and combined
  • [ ] Risk management with VaR, limits, and alerts
  • [ ] Execution engine calculates trades and costs
  • [ ] System integration with run cycle

Exercises Completed

  • [ ] C.1: Data Quality Validator
  • [ ] C.2: Strategy Performance Tracker
  • [ ] C.3: Dynamic Risk Adjustment
  • [ ] C.4: Complete Trading System
  • [ ] C.5: Performance Dashboard
  • [ ] C.6: System Extensions

Congratulations!

You've completed Course 3: Quantitative Finance & Portfolio Theory!

What You've Built

A complete quantitative trading system with: - Multi-source data pipeline - Multi-strategy portfolio optimization - Real-time risk management - Cost-aware execution - Performance visualization

Key Skills Developed

  • Portfolio Theory: Mean-variance, risk parity, factor models
  • Risk Management: VaR, CVaR, stress testing, limits
  • System Design: Component integration, state management
  • Production Skills: Monitoring, execution, deployment

Next Steps

  1. Paper Trade: Test your system with simulated trading
  2. Iterate: Refine strategies based on performance
  3. Deploy: Move to cloud infrastructure (see Modules 17-18)
  4. Continue Learning: Course 4 (ML for Finance) awaits!

Good luck on your quantitative trading journey!